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1. This 'nternational Searching Authority 
(i) considers that there are _ 


. (number of) inventions claimed in the international application covered 


by the claims indicated _I_B_M/on the extra sheet: 


and it considers that the international application does not comply with the requirements of unity of invention 
(Rules 13.1, 13.2 and 13.3) for the reasons indicatedXXaOfc/on the extra sheet: 


______ 


(ii) [x] has carried out a partial international search (see Annex) Q will establish the international search report 
on those parts of the international application which relate to the invention first mentioned in claims Nos.: 

SEE PCT/ISA/206 

(iii) will establish the international search report on the other parts of the international application only if, and to the extent 
to which, additional fees are paid 

2. The applicant is hereby invited, within the time limit indicated above, to pay the amount indicated below: 


Fee per additional invention 


l 

number of additional inventions 


EUR 945,00 

total amount of additional fees 


Or, 


The applicant is informed that, according to Rule 40.2(c), the payment of any additional fee may be made under protest, 

i.e., a reasoned statement to the effect that the international application complies with the requirement of unity of invention 
or that the amount of the required additional fee is excessive. 


3. j I Claim(s) Nos. _ _ 


_ have been found to be unsearchable under 


Article 1 7(2)(b) because of defects under Article 17(2)(a) and therefore have not been included with any invention. 
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This International Searching Authority found multiple (groups of) 
inventions in this international application, as follows: 

1. Claims: 1-3,10,11,26 partially, 4-6 completely 

Invention 1: 

A synthetic nucleic acid sequence which encodes a protein 
wherein at least one non-common codon or less-common codon 
has been replaced by a common codon; said synthetic nucleic 
acid encoding a Factor VIII; a vector comprising said 
synthetic nucleic acid; a cell comprising said vector; a 
method for preparing a synthetic nucleic acid sequence which 
is at least 90 codons in length. 

1.1. Claims: 1-3,10,11,26 partially, 7-9 completely 
Invention 2: 

A synthetic nucleic acid sequence which encodes a 
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less-common codon has been replaced by a common codon; 
said synthetic nucleic acid encoding a Factor IX; a 
vector comprising said synthetic nucleic acid; a cell 
comprising said vector; a method for preparing a 
synthetic nucleic acid sequence which is at least 90 
codons in length. 

2. Claims: 1-3,26 partially, 12-25 completely 

Invention 3: 

A synthetic nucleic acid sequence which encodes a protein 
wherein at least one non-common codon or less-common codon 
has been replaced by a common codon; said synthetic nucleic 
acid encoding a alpha-galactosidase; a vector comprising 
said synthetic nucleic acid; a cell comprising said vector; 
methods for producing alpha-galactosidase or preparing a 
synthetic nucleic acid encoding the same or providing a 
subject with the same; a method for preparing a synthetic 
nucleic acid sequence which is at least 90 codons in length. 

Please note that all inventions mentioned under item 1, although not 
necessarily linked by a common inventive concept, could be searched 
without effort justifying an additional fee. 

The application lacks unity of inventions as required by Article 
3(4)(iii) and 17(3)(a) PCT for the following reasons: 

The inventions as defined above relate to various synthetic nucleic 
acids whose expression is enhanced. 


The method for replacing non-common or less-common codons in nucleic 
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acids by common codons to enhance expression is already known in the 
art, in particular for Factor VIII and Factor IX (please see, e.g., 
W09812207 page 1, lines 7-15, Page 5, lines 23-25; page 45, line 23 - 
page 46, line 11). 

In light of this prior art the above mentioned common concept is not 
novel and the problem underlying the present application can be 
redefined as: 

The provision of additional, synthetic nucleic acids with replacement of 
non-common or less-common codons by common codons. The above inventions 
1 to 3 are different solutions to this problem. 

Due to the fact that the method for replacing non-common or less- common 
codons in nucleic acids by common codons to enhance expression is known 
in the prior art and due to the fact that no other technical feature can 
be distinguished which, in the light of the prior art, could be regarded 
as a special technical feature in the sense of Rule 13.2 PCT due to the 
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nucleic acids, there is no single general inventive concept underlying 
the plurality of claimed inventions of the present application in the 
sense of Rule 13.1 PCT. 

Consequently, the application lacks unity of invention and the 
different inventions are as formulated as the different subjects on 
the communication persuant to Art. 17(3)(a) PCT. 

It should be noted, that for regrouping the different inventions, the 
ISA has taken into account the balance between necessary search 
efforts and the levying of additional fees, i.e. in the present case, an 
international search report has been established for inventions 1 and 2 
without asking for an additioal fee for invention 2. 

Attention is drawn to the fact, that further objections concerning 
absence of unity of invention with respect to invention 3 may be raised, 
depending on the result of the respective search. 
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results of the international search established on the parts of the international application which relate to the invention 
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2.This communication is not the international search report which will be established according to Article 18 and Rule 43. 
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OPTIMIZED MESSENGER RN A 

Cross Reference To Related Applications 

This application is a continuation in part of U.S. Serial Number 09/407,605, filed 
September 28, 1999, which claims the benefit of prior U.S. provisional application 60/102,239, 
filed September 29, 1998, and prior U.S. provisional application 60/130, 241, filed April 20, 
1999, the contents of which are herein incorporated by reference. 

Field of the Invention 

The invention is directed to methods for optimizing the properties of mRNA molecules, 
optimized mRNA molecules, methods of using optimized mRNA molecules, and compositions 
which include optimized mRNA molecules. 

Background of the Invention 

In eukaroytes, gene expression is affected, in part, by the stability and structure of the 
messenger RNA (mRNA) molecule. mRNA stability influences gene expression by affecting the 
steady-state level of the mRNA. It can affect the rates at which the mRNA disappears following 
transcriptional repression and accumulates following transcriptional induction. The structure and 
nucleotide sequence of the mRNA molecule can also influence the efficiency with which these 
individual mRNA molecules are translated. 

The intrinsic stability of a given mRNA molecule is influenced by a number of specific 
internal sequence elements which can exert a destabilizing effect on the mRNA. These elements 
may be located in any region of the transcript, and e.g., can be found in the 5' untranslated region 
(5'UTR), in the coding region and in the 3' untranslated region (S'UTR). It is well established 
that shortening of the poly(A) tail initiates mRNA decay (Ross, Trends in Genetics, 12:171-175, 
1996). The poly(A) tract influences cytoplasmic mRNA stability by protecting mRNA from 
rapid degradation. Adenosine and uridine rich elements (AUREs) in the 3'UTR are also 
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associated with unstable mammalian mRNA's. It has been demonstrated that proteins that bind 
to AURE, AURE-binding proteins (AUBPs) can affect mRNA stability. The coding region can 
also alter the half-life of many RNAs. For example, the coding region can interact with proteins 
that protect it from endonucleolytic attack. Furthermore, the efficiency with which individual 
mRNA molecules are translated has a strong influence on the stability of the mRNA molecule 
(Herricketal.,Afo/Cfe//5io/. 10,2269-2284, 1990, and Hoekema et al., Mol Cell Biol. 7,2914- 
2924, 1987). 

The single-stranded nature of mRNA allows it to adopt secondary and tertiary structure in 
a sequence-dependent manner through complementary base pairing. Examples of such structures 
include RNA hairpins, stem loops and more complex structures such as bifurcations, 
pseudoknots and triple-helices. These structures influence both mRNA stability, e.g., the stem 
~i«™™to ; n tv,^ v tttt? ran crtvp. as an fmdonuclease cleavaee site, and affect translational 
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efficiency. 

In addition to the structure of the mRNA, the nucleotide content of the mRNA can also 
play a role in the efficiency with which the mRNA is translated. For example, mRNA with a 
high GC content at the 5 'untranslated region (UTR) may be translated with low efficiency and a 
reduced translational effect can reduce message stability. Thus, altering the sequence of a 
mRNA molecule can ultimately influence mRNA transcript stability, by influencing the 
translational stability of the message. 

Factor VIII and Factor IX are important plasma proteins that participate in the intrinsic 
pathway of blood coagulation. Their dysfunction or absence in individuals can result in blood 
coagulation disorders, e.g., a deficiency of Factor Vm or Factor IX results in Hemophilia A or 
B, respectively. Isolating Factor VIII or Factor IX from blood is difficult, e.g., the isolation of 
Factor VIII is characterized by low yields, and also has the associated danger of being 
contaminated with infectious agents such as Hepatitis B virus, Hepatitis C virus or HIV. 
Recombinant DNA technology provides an alternative method for producing biologically active 
Factor VIII or Factor IX. While these methods have had some success, improving the yield of 
Factor VIII or Factor IX is still a challenge. 

An approach to increasing protein yield using recombinant DNA technology is to modify 
the coding sequence of a protein of interest, e.g., Factor VIII or Factor IX, without altering the 
amino acid sequence of the gene product. This approach involves altering, for example, the 
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native Factor VIII or Factor EX gene sequence such that codons which are not so frequently 
used in mammalian cells are replaced with codons which are overrepresented in highly expressed 
mammalian genes. Seed et al., (WO 98/12207) used this approach with a measure of success. 
They found that substituting the rare mammalian codons with those frequently used in 
mammalian cells results in a four fold increase in Factor VIII production from mammalian cells. 

Summary of the Invention 

In one aspect, the invention features, a synthetic nucleic acid sequence which encodes a 
protein, or a portion thereof, wherein at least one non-common codon or less-common codon has 
been replaced by a common codon, and wherein the synthetic nucleic acid sequence includes a 
continuous stretch of at least 90 codons all of which are common codons. 

The synthetic nucleic acid can direct the synthesis of an optimized messenger mRNA. In 
a preferred embodiment, the continuous stretch of common codons can include: the sequence of 
a pre-pro-protein; the sequence of a pro-protein; the sequence of a mature protein; the "pre" 
sequence of a pre-pro-protein; the "pre-pro" sequence of a pre-pro-protein; the t( pro" sequence of 
a pre-pro or a pro-protein; or a portion of any of the aforementioned sequences. 

In a preferred embodiment, the synthetic nucleic acid sequence includes a continuous 
stretch of at least 90, 95, 100, 125, 150, 200, 250, 300 or more codons all of which are common 
codons. 

In another preferred embodiment, the nucleic acid sequence encoding a protein has at 
least 30, 50, 60, 75, 100, 200 or more non-common or less-common codons replaced with a 
common codon. 

In a preferred embodiment, the number of non-common or less-common codons replaced 
is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1. 

In a preferred embodiment, the number of non-common or less-common codons 
remaining is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1. 

In preferred embodiments, the non-common and less-common codons replaced, taken 
together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1% of the codons in the synthetic nucleic 
acid sequence. 
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In preferred embodiments, the non-common and less-common codons remaining, taken 
together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1% ofthe codons in the synthetic nucleic 
acid sequence. 

In a preferred embodiment, all ofthe non-common or less-common codons ofthe 
synthetic nucleic acid sequence encoding a protein have been replaced with common codons. 

In a preferred embodiment, the synthetic nucleic acid sequence encodes a protein of at 
least about 90, 95, 100, 105, 110, 120, 130, 150, 200, 500, 700, 1000 or more amino acids in 
length. 

In various preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 99%, or all, of 
the codons in the synthetic nucleic acid sequence are common codons. Preferably, all ofthe 
codons in the synthetic nucleic acid sequence are common codons. 

r j u_ j: nmtpin pvnrRsspr! in a eukarvotic cell, e.g., a 
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mammalian cell, e.g., a human cell, and the protein is a mammalian protein, e.g., a human 
protein. 

In another aspect, the invention features, a synthetic nucleic acid sequence which encodes 
a protein, or a portion thereof, wherein at least one non-common codon or less-common codon 
has been replaced by a common codon, and wherein the synthetic nucleic acid sequence includes 
a continuous stretch of common codons, which continuous stretch includes at least 33% or more 
ofthe codons in the synthetic nucleic acid sequence. 

The synthetic nucleic acid can direct the synthesis of an optimized messenger mRNA. In 
a preferred embodiment, the continuous stretch of common codons can include: the sequence of 
a pre-pro-protein; the sequence of a pro-protein; the sequence of a mature protein; the "pre" 
sequence of a pre-pro-protein; the "pre-pro" sequence of a pre-pro-protein; the "pro" sequence of 
a pre-pro or a pro-protein; or a portion of any of the aforementioned sequences. 

In a preferred embodiment, the synthetic nucleic acid sequence includes a continuous 
stretch of common codons wherein the continuous stretch includes at least 35%, 40%, 50%, 
60%, 70%, 80%, 90%, 95% or 100% of codons in the synthetic nucleic acid sequence. 

In a preferred embodiment, the number of non-common or less-common codons replaced 
is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1. 
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In a preferred embodiment, the number of non-common or less-common codons 
remaining is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1. 

In preferred embodiments, the non-common and less-common codons replaced, taken 
together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1% of the codons in the synthetic nucleic 
acid sequence. 

In preferred embodiments, the non-common and less-common codons remaining, taken 
together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1% of the codons in the synthetic nucleic 
acid sequence. 

In a preferred embodiment, all of the non-common or less-common codons of the 
synthetic nucleic acid sequence encoding a protein have been replaced with common codons. 

In a preferred embodiment, all non-common and less-common codons are replaced with 
common codons. 

In a preferred embodiment, the synthetic nucleic acid sequence encodes a protein of at 
least about 90, 95, 100, 105, 110, 120, 130, 150, 200, 500, 700, 1000 or more amino acids in 
length. 

In various preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 99%, or all, of 
the codons in the synthetic nucleic acid sequence are common codons. Preferably, all of the 
codons in the synthetic nucleic acid sequence are common codons. 

In preferred embodiments, the protein is expressed in a eukaryotic cell, e.g., a 
mammalian cell, e.g., a human cell, and the protein is a mammalian protein, e.g., a human 
protein. 

In another aspect, the invention features, a synthetic nucleic acid sequence which encodes 
a protein, or a portion thereof, wherein at least one non-common codon or less-common codon 
has been replaced by a common codon, and wherein the number of non-common and less- 
common codons, taken together, is less than n/x, wherein n/x is a positive integer, n is the 
number of codons in the syndietic nucleic acid sequence and x is chosen from 2, 4, 6, 10, 1 5, 20, 
50, 150, 250, 500 and 1000. (Fractional values for n/x are rounded to the next highest of lowest 
integer, positive values below 0.5 are rounded down and values above 0.5 are rounded up). 

The synthetic nucleic acid can direct the synthesis of an optimized messenger mRNA. In 
a preferred embodiment, the continuous stretch of common codons can include: the sequence of 
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a pre-pro-protein; the sequence of a pro-protein; the sequence of a mature protein; the "pre" 
sequence of a pre-pro-protein; the "pre-pro" sequence of a pre-pro-protein; the "pro" sequence of 
a pre-pro or a pro-protein; or a portion of any of the aforementioned sequences. 

In a preferred embodiment, the number of codons in the synthetic nucleic acid sequence 
(n) is at least 50, 60, 70, 80, 90, 100, 120, 150, 200, 350, 400, 500 or more. 

In a preferred embodiment, the number of non-common or less-common codons replaced 
is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1. 

In a preferred embodiment, the number of non-common or less-common codons 
remaining is less than 15, 14, 13, 12, 1 1, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1. 

In preferred embodiments, the non-common and less-common codons replaced, taken 
together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1% of the codons in the synthetic nucleic 
acid sequence. 

In preferred embodiments, the non-common and less-common codons remaining, taken 
together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1% of the codons in the synthetic nucleic 
acid sequence. 

In a preferred embodiment, all non-common or less-common codons are replaced with 
common codons. 

In various preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 99%, or all of the 
codons in the synthetic nucleic acid sequence are common codons. Preferably, all of the codons 
in the synthetic nucleic acid sequence are common codons. 

In preferred embodiments, the protein is expressed in a eukaryotic cell, e.g., a 
mammalian cell, e.g., a human cell, and the protein is a mammalian protein, e.g., a human 
protein. 

In another aspect, the invention features, a synthetic nucleic acid sequence which encodes 
a protein, or a portion thereof, wherein at least one non-common codon or less-common codon 
has been replaced by a common codon in the sequence that has not been optimized (non- 
optimized) which encodes the protein, wherein at least 94% or more of the codons in the 
sequence encoding the protein are common codons and wherein the synthetic nucleic acid 
sequence encodes a protein of at least about 90, 100 or 120 amino acids in length. 


6 


WO 02/064799 


PCT/U SO 1/42655 


The synthetic nucleic acid can direct the synthesis of an optimized messenger mRNA. 
In a preferred embodiment, the continuous stretch of common codons can include: the sequence 
of a pre-pro-protein; the sequence of a pro-protein; the sequence of a mature protein; the "pre" 
sequence of a pre-pro-protein; the "pre-pro" sequence of a pre-pro-protein; the "pro" sequence of 
a pre-pro or a pro-protein; or a portion of any of the aforementioned sequences. 

In preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 99%, 99.5% or more of 
non-common or less-common codons in the non-optimized nucleic acid sequence encoding the 
protein have been replaced by a common codon encoding the same amino acid. Preferably, all 
non-common or all less-common codon are replaced by a common codon encoding the same 
amino acid as found in the non-optimized sequence. 

In a preferred embodiment, the synthetic nucleic acid sequence encodes a protein of at 
i towon o^ mn ms no l?n no 1 SO 9.00. 500. 700. 1000 or more amino acids in 

length. 

In other preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 98.5%, 99%, 99.5% 
of the non-common codons in the non-optimized nucleic acid sequence are replaced with 
common codons. Preferably, all of the non-common codons are replaced with the common 
codons. 

In other preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 98%, 99%, 99.5% 
of the less-common codons in the non-optimized nucleic acid sequence are replaced with 
common codons. Preferably, all of the less-common codons are replaced with the common 
codons. 

In preferred embodiments, at least 94% or more of the non-common and less common 
codons are replaced with common codons. 

In preferred embodiments, the number of codons replaced which are not common codons 
is equal to or less than 15, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1. 

In preferred embodiments, the number of codons remaining which are not common 
codons is equal to or less than 15, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 

In preferred embodiments, the protein is expressed in a eukaryotic cell, e.g., a 
mammalian cell, e.g., a human cell, and the protein is a mammalian protein, e.g., a human 
protein. 
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The synthetic nucleic acid can direct the synthesis of an optimized messenger mRNA. 
In a preferred embodiment, the continuous stretch of common codons can include: the sequence 
of a pre-pro-protein; the sequence of a pro-protein; the sequence of a mature protein; the "pre" 
sequence of a pre-pro-protein; the "pre-pro" sequence of a pre-pro-protein; the "pro" sequence of 
a pre-pro or a pro-protein; or a portion of any of the aforementioned sequences. 

In a preferred embodiment the synthetic nucleic acid sequence is at least 100, 1 10, 120, 
1 50, 200, 300, 500, 700, 1000 or more base pairs in length. 

In another aspect, the invention features a synthetic nucleic acid sequence that directs the 
synthesis of an optimized message which encodes a Factor Vm protein having one or more of 
the following characteristics: 

a) the B domain is deleted (BDD Factor VIII): 

b) the synthetic nucleic acid sequence has a recognition site for an intracellular 
protease of the PACE/furin class, e.g., X-Arg-X-X-Arg (Molloy et al., Biol Chem. 
267:1639616401, 1992); a short-peptide linker, e.g., a two peptide linker, e.g., a leucine-glutamic 
acid peptide linker (LE), a three, or a four peptide linker, inserted at the heavy-light chain 
junction. 

c) the synthetic nucleic acid sequence is introduced into a cell, e.g., a primary cell, a 
secondary cell, a transformed or an immortalized cell line. Examples of an immortalized human 
cell line useful in the present method include, but are not limited to; a Bowes Melanoma cell 
(ATCC Accession No. CRL 9607), a Daudi cell (ATCC Accession No. CCL 213), a HeLa cell 
and a derivative of a HeLa cell (ATCC Accession Nos. CCL 2, CCL2.1, and CCL 2.2), a HL-60 
cell (ATCC Accession No. CCL 240), a HT-1080 cell (ATCC Accession No. CCL 121), a Jurkat 
cell (ATCC Accession No. TIB 152), a KB carcinoma cell (ATCC Accession No. CCL 17), a K- 
562 leukemia cell (ATCC Accession No. CCL 243), a MCF-7 breast cancer cell (ATCC 
Accession No. BTH 22), a MOLT-4 cell (ATCC Accession No. 1582), a Namalwa cell (ATCC 
Accession No. CRL 1432), a Raji cell (ATCC Accession No. CCL 86), a RPMI 8226 cell 
(ATCC Accession No. CCL 155), a U-937 cell (ATCC Accession No. CRL 1593), WI-38VA13 
sub line 2R4 cells (ATCC Accession No. CLL 75.1), a CCRF-CEM cell (ATCC Accession No. 
CCL 119) and a 2780AD ovarian carcinoma cell (Van Der Blick et al., Cancer Res. 48: 5927- 
5932, 1988), as well as heterohybridoma cells produced by fusion of human cells and cells of 
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another species. In another embodiment, the immortalized cell line can be cell line other than a 
human cell line, e.g., a CHO cell line or a COS cell line. In a preferred embodiment, the cell is a 
non-transformed cell. In a preferred embodiment, the cell can be from a clonal cell strain. In 
various preferred embodiments, the cell is a mammalian cell, e.g., a primary or secondary 
mammalian cell, e.g., a fibroblast, a hematopoietic stem cell, a myoblast, a keratinocyte, an 
epithelial cell, an endothelial cell, a glial cell, a neural cell, a cell comprising a formed element 
of the blood, a muscle cell and precursors of these somatic cells. In a most preferred 
embodiment, the cell is a secondary human fibroblast 

In a preferred embodiment, the synthetic nucleic acid sequence which encodes a factor 
VIII protein has at least one, preferably at least two, and most preferably, all of the 
characteristics a, b, and c described above. 

r« ^t^lWr-vosJ __. m Vi_^.™^rWr of l^act r\rtt> nrm-nnminrm rnHrm f\T 1 P. nmm HT1 rndoTI nfthe 

111 jJiWUHVU Villi/ VUlAliVllWJj V* V AWUWb vuv _._.>-._* vv -._-__--__ v __- — ~ _____ 

synthetic nucleic acid has been replaced by a common codon and the synthetic nucleic acid has 
one or more of the following properties: it has a continuous stretch of at least 90 codons all of 
which are common codons; it has a continuous stretch of common codons which comprise at 
least 33% of the codons of the synthetic nucleic acid sequence; at least 94% or more of the 
codons in the sequence encoding the protein are common codons and the synthetic nucleic acid 
sequence encodes a protein of at least about 90, 100, or 120 amino acids in length; it is at least 80 
base pairs in length and is free of unique restriction endonuclease sites that would occur in the 
message optimized sequence. 

In a preferred embodiment, the number of non-common or less-common codons replaced 
is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1. 

In a preferred embodiment, the number of non-common or less-common codons 
remaining is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1. 

In preferred embodiments, the non-common and less-common codons replaced, taken 
together, are equal to or less then 6%, 5%, 4%, 3%, 2%, 1% of the codons in the synthetic 
nucleic acid sequence. 

In preferred embodiments, the non-common and less-common codons remaining, taken 
together, are equal to or less then 6%, 5%, 4%, 3%, 2%, 1% of the codons in the synthetic 
nucleic acid sequence. 
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In a preferred embodiment, all non-common or less-common codons are replaced with 
common codons. 

In a preferred embodiment, all non-common and less-common codons are replaced with 
common codons. 

In various preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 99%, or all of the 
codons in the synthetic nucleic acid sequence are common codons. 

Preferably, all of the codons in the synthetic nucleic acid sequence are common codons. 

In preferred embodiments, the protein is expressed in a eukaryotic cell, e.g., a 
mammalian cell, e.g., a human cell, and the protein is a mammalian protein, e.g., a human 
protein. 

In a preferred embodiment, the synthetic nucleic acid sequence includes a continuous 
stretch of common codons wherein the continuous stretch comprises at least 35%, 40%, 50%, 
60%, 70%, 80%, 90%, 95% or 100% of codons in the synthetic nucleic acid sequence. 

In another aspect, the invention features, a synthetic nucleic acid sequence which can 
direct the synthesis of an optimized message which encodes a Factor IX protein having one or 
more of the following characteristics: 

a) it has a PACE/furin, such as a X-Arg-X-X-Arg site, at a pro-peptide mature 
protein junction; or 

b) is inserted, e.g., via transfection, into a non-transformed cell, e.g., a primary or 
secondary cell, e.g., a primary human fibroblast. 

In a preferred embodiment, the synthetic nucleic acid sequence which encodes a factor IX 
protein has at least one, and preferably, both of the characteristics a) and b) described above. 

In preferred embodiments, at least one non-common codon or less-common codon of the 
synthetic nucleic acid has been replaced by a common codon and the synthetic nucleic acid has 
one or more of the following properties: it has a continuous stretch of at least 90 codons all of 
which are common codons; it has a continuous stretch of common codons which comprise at 
least 33% of the codons of the synthetic nucleic acid sequence; at least 94% or more of the 
codons in the sequence encoding the protein are common codons and the synthetic nucleic acid 
sequence encodes a protein of at least about 90, 100, or 120 amino acids in length; it is at least 80 


10 


WO 02/064799 


PCT/US01/42655 


base pairs in length and is free of unique restriction endonuclease sites that occur in the 
message optimized sequence. 

In a preferred embodiment, the number of non-common or less-common codons replaced 
is less than 15, 14, 13, 12, 11,10, 9, 8, 7, 6, 5, 4, 3, 2 or 1. 

In a preferred embodiment, the number of non-common or less-common codons 
remaining is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1. 

In preferred embodiments, the non-common and less-common codons replaced, taken 
together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1% of the codons in the synthetic nucleic 
acid sequence. 

In preferred embodiments, the non-common and less-common codons remaining, taken 
together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1% of the codons in the synthetic nucleic 
acid sequence. 

In a preferred embodiment, all non-common or less-common codons are replaced with 
common codons. 

In a preferred embodiment, all non-common and less-common codons are replaced with 
common codons. 

In various preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 99%, or all of the 
codons in the synthetic nucleic acid sequence are common codons. 

Preferably, all of the codons in the synthetic nucleic acid sequence are common codons. 

In preferred embodiments, the protein is expressed in a eukaryotic cell, e.g., a 
mammalian cell, e.g., a human cell, and the protein is a mammalian protein, e.g., a human 
protein. 

In a preferred embodiment, the synthetic nucleic acid sequence includes a continuous 
stretch of common codons wherein the continuous stretch comprises at least 35%, 40%, 50%, 
60%, 70%, 80%, 90%, 95% or 100% of codons in the synthetic nucleic acid sequence. 

In another aspect, the invention features a synthetic nucleic acid sequence which can 
direct the synthesis of an optimized message which encodes a-galactosidase. 

In a preferred embodiment, the synthetic nucleic acid sequence which encodes a- 
galactosidase is inserted, e.g., via transfection, into a non-transformed cell, e.g., a primary or 

secondary cell, e.g., a primary human fibroblast. 

11 


WO 02/064799 


PCT/U SO 1/42655 


In preferred embodiments, at least one non-common codon or less-common codon of 
the synthetic nucleic acid has been replaced by a common codon and the synthetic nucleic acid 
has one or more of the following properties: it has a continuous stretch of at least 90 codons all 
of which are common codons; it has a continuous stretch of common codons which comprise at 
least 33% of the codons of the synthetic nucleic acid sequence; at least 94% or more of the 
codons in the sequence encoding the protein are common codons and the synthetic nucleic acid 
sequence encodes a protein of at least about 90, 100, or 1 20 amino acids in length; it is at least 80 
base pairs in length and is free of unique restriction endonuclease sites that occur in the message 
optimized sequence. 

In a preferred embodiment, the number of non-common or less-common codons replaced 
is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1. 

In a preferred embodiment, the number of non-common or less-common codons 
remaining is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1. 

In preferred embodiments, the non-common and less-common codons replaced, taken 
together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1% of the codons in the synthetic nucleic 
acid sequence. 

In preferred embodiments, the non-common and less-common codons remaining, taken 
together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1% of the codons in the synthetic nucleic 
acid sequence. 

In a preferred embodiment, all non-common or less-common codons are replaced with 
common codons. 

In a preferred embodiment, all non-common and less-common codons are replaced with 
common codons. 

In various preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 99%, or all of the 
codons in the synthetic nucleic acid sequence are common codons. 

Preferably, all of the codons in the synthetic nucleic acid sequence are common codons. 

In preferred embodiments, the protein is expressed in a eukaryotic cell, e.g., a 
mammalian cell, e.g., a human cell, and the protein is a mammalian protein, e.g., a human 
protein. 
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In a preferred embodiment, the synthetic nucleic acid sequence includes a continuous 
stretch of common codons wherein the continuous stretch comprises at least 35%, 40%, 50%, 
60%, 70%, 80%, 90%, 95% or 100% of codons in the synthetic nucleic acid sequence. 

In another aspect, the invention features, a plasmid or a DNA construct, e.g., an 
expression plasmid or a DNA construct, which includes a synthetic nucleic acid sequence 
described herein. 

In yet another aspect, the invention features, a synthetic nucleic acid sequence described 
herein introduced into the genome of an animal cell. In a preferred embodiment, the animal cell 
is a primate cell, e.g., a mammal cell, e.g., a human cell. 

In still another aspect, the invention features, a cell harboring a synthetic nucleic acid 
sequence described herein, e.g., a cell from a primary or secondary cell strain, or a cell from a 
continuous cell line, e.g., a Bowes Melanoma cell (ATCC Accession No. CRL 9607), a Daudi 
cell (ATCC Accession No. CCL 213), a HeLa cell and a derivative of a HeLa cell (ATCC 
Accession Nos. CCL 2, CCL2.1, and CCL 2.2), a HL-60 cell (ATCC Accession No. CCL 240), a 
HT-1080 cell (ATCC Accession No. CCL 121), a Jurkat cell (ATCC Accession No. TIB 152), a 
KB carcinoma cell (ATCC Accession No. CCL 17), a K-562 leukemia cell (ATCC Accession 
No. CCL 243), a MCF-7 breast cancer cell (ATCC Accession No. BTH 22), a MOLT-4 cell 
(ATCC Accession No. 1582), a Namalwa cell (ATCC Accession No. CRL 1432), a Raji cell 
(ATCC Accession No. CCL 86), a RPMI 8226 cell (ATCC Accession No. CCL 1 55), a U-937 
cell (ATCC Accession No. CRL 1593), a WI-38VA13 sub line 2R4 cell (ATCC Accession No. 
CLL 75.1), a CCRF-CEM cell (ATCC Accession No. CCL 1 19) and a 2780AD ovarian 
carcinoma cell (Van Der Blick et al., Cancer Res. 48: 5927-5932, 1988), as well as 
heterohybridoma cells produced by fusion of human cells and cells of another species. In another 
embodiment, the immortalized cell line can be a cell line other than a human cell line, e.g., a 
CHO cell line or a COS cell line. In a preferred embodiment, the cell is a non-transformed cell. 
In a preferred embodiment, the cell is from a clonal cell strain. In various preferred 
embodiments, the cell is a mammalian cell, e.g., a primary or secondary mammalian cell, e.g., a 
fibroblast, a hematopoietic stem cell, a myoblast, a keratinocyte, an epithelial cell, an endothelial 
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cell, a glial cell, a neural cell, a cell comprising a formed element of the blood, a muscle cell 
and precursors of these somatic cells. In a most preferred embodiment, the cell is a secondary 
human fibroblast. 

In another aspect, the invention features, a method for preparing a synthetic nucleic acid 
sequence encoding a protein which is, preferably, at least 90 codons in length, e.g., a synthetic 
nucleic acid sequence described herein. The method includes identifying non-common and less- 
common codons in the non-optimized gene encoding the protein and replacing at least, 94%, 
95%, 96%, 97%, 98%, 99% or more of the non-common and less-common codons with a 
common codon encoding the same amino acid as the replaced codon. Preferably, all non- 
common and less-common codons are replaced with common codons. 

In a preferred embodiment, the synthetic nucleic acid sequence encodes a protein of at 
least about 90, 95, 100, 105, 110, 120, 130, 150, 200, 500, 700, 1000 or more codons in length. 

In preferred embodiments, the protein is expressed in a eukaryotic cell, e.g., a 
mammalian cell, e.g., a human cell, and the protein is a mammalian protein, e.g., a human 
protein. 

In another aspect, the invention features, a method for making a nucleic acid sequence 
which directs the synthesis of a optimized message of a protein of at least 90, 100, or 120 amino 
acids in length, e.g., a synthetic nucleic acid sequence described herein. The method includes: 
synthesizing at least two fragments of the nucleic acid sequence, wherein the two fragments 
encode adjoining portions of the protein and wherein both fragments are mRNA optimized, e.g., 
as described herein; and joining the two fragments such that a non-common codon is not created 
at a junction point, thereby making the mRNA optimized nucleic acid sequence. 

In a preferred embodiment, the two fragments are joined together such that a unique 
restriction endonuclease site used to create the two fragments is not recreated at the junction 
point. In another preferred embodiment, the two fragments are joined together such that a 
unique restriction site is created. 

In a preferred embodiment, the synthetic nucleic acid sequence encodes a protein of at 
least about 90, 95, 100, 105, 1 10, 120, 130, 150, 200, 500, 700, 1000 or more codons in length. 
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In a preferred embodiment, at least 3, 4, 5, 6, 7, 8, 9, 10 or more fragments of the 
nucleic acid sequence are synthesized. 

In a preferred embodiment, the fragments are joined together by a fusion, e.g., a blunt end 

fusion. 

In various preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 99%, or all of the 
codons in the synthetic nucleic acid sequence are common codons. Preferably, all of the codons 
in the synthetic nucleic acid sequence are common codons. 

In preferred embodiments, the number of codons which are not common codons is equal 
to or less than 15, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1. 

In preferred embodiments, each fragment is at least 30, 40, 50, 75, 100, 120, 150 or more 
codons in length. 

In another aspect, the invention features, a method of providing a subject, e.g., a human, 
with a protein. The methods includes: providing a synthetic nucleic acid sequence that can 
direct the synthesis of an optimized message for a protein, e.g., a synthetic nucleic acid sequence 
described herein; introducing the synthetic nucleic acid sequence that directs the synthesis of an 
optimized message for a protein into the subject; and allowing the subject to express the protein, 
thereby providing the subject with the protein. 

In preferred embodiments, the method further includes inserting the nucleic acid 
sequence that can direct the synthesis of an optimized message into a cell. The cell can be an 
autologous, allogeneic, or xenogeneic cell, but is preferably autologous. A preferred cell is a 
fibroblast, a hematopoietic stem cell, a myoblast, a keratinocyte, an epithelial cell, an endothelial 
cell, a glial cell, a neural cell, a cell comprising a formed element of the blood, a muscle cell and 
precursors of these somatic cells. The mRNA optimized synthetic nucleic acid sequence can be 
inserted into the cell ex vivo or in vivo. If inserted ex vivo, the cell can be introduced into the 
subject. 

In preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 99%, or all of the codons 
in the synthetic nucleic acid sequence are common codons. Preferably, all of the codons in the 
synthetic nucleic acid sequence are common codons. 

In preferred embodiments, the number of codons which are not common codons is equal 
to or less than 15, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1. 
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The invention also features synthetic nucleic acid fragments which encode a portion of a 
protein. Such synthetic nucleic acid fragments are similar to the synthetic nucleic acid sequences 
of the invention except that they encode only a portion of a protein. Such nucleic acid fragments 
preferably encode at least 50, 60, 70, 80, 100, 110, 120, 130, 150, 200, 300, 400, 500, or more 
contiguous amino acids of the protein. 

The invention also features transfected or infected primary and secondary somatic cells of 
vertebrate origin, particularly of mammalian origin, e.g., of human, mouse, or rabbit origins, e.g., 
primary human cells, secondary human cells, or primary or secondary rabbit cells. The cells are 
transfected or infected with exogenous synthetic nucleic acid, e.g., DNA, described herein. The 
synthetic nucleic acid can encode a protein, e.g,, therapeutic protein, e.g., an enzyme, e_g :> *x- 
galactosidase, a cytokine, a hormone, an antigen, an antibody, a clotting factor, e.g., Factor VIE, 
Factor EX, or a regulatory protein. The invention also includes methods by which primary and 
secondary cells are transfected or infected to include exogenous synthetic DNA, methods of 
producing clonal cell strains or heterogenous cell strains, and methods of gene therapy in which 
the transfected or infected primary or secondary cells are used The synthetic nucleic acid directs 
the synthesis of an optimized message, e.g., an optimized message as described herein. 

The present invention includes primary and secondary somatic cells, which have been 
transfected or infected with an exogenous synthetic nucleic acid described herein, which is stably 
integrated into their genomes or is expressed in the cells episomally. In preferred embodiments 
the cells are fibroblasts, keratinocytes, epithelial cells, endothelial cells, glial cells, neural cells, 
cells comprising a formed element of the blood, muscle cells, other somatic cells which can be 
cultured, or somatic cell precursors. The resulting cells are referred to, respectively, as 
transfected or infected primary cells and transfected or infected secondary cells. The exogenous 
synthetic DNA encodes a protein, or a portion thereof, e.g., a therapeutic protein (e.g., Factor 
VIII or Factor IX). In the embodiment in which the exogenous synthetic DNA encodes a 
protein, or a portion thereof, to be expressed by the recipient cells, the resulting protein can be 
retained within the cell, incorporated into the cell membrane or secreted from the cell. In this 
embodiment, the exogenous synthetic DNA encoding the protein is introduced into cells along 
with additional DNA sequences sufficient for expression of the exogenous synthetic DNA in the 
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cells. The additional DNA sequences may be of viral or non-viral origin. Primary cells 
modified to express exogenous synthetic DNA are referred to herein as transfected or infected 
primary cells, which include cells removed from tissue and placed on culture medium for the 
first time. Secondary cells modified to express or render available exogenous DNA are referred 
to herein as transfected or infected secondary cells. 

Primary and secondary cells transfected or infected by the subject method, e.g., cloned 
cell strains, can be seen to fall into three types or categories: 1) cells which do not, as obtained, 
make or contain the therapeutic protein, 2) cells which make or contain the therapeutic protein 
but in lower quantities than normal (in quantities less than the physiologically normal lower 
level) or in defective form, and 3) cells which make the therapeutic protein at physiologically 
normal levels, but are to be augmented or enhanced in their content or production. Examples of 

Hv fhp nrpQpint mftthoH fndiiHe. r.vtokines or dotting factors. 

Exogenous synthetic DNA is introduced into primary or secondary cell by a variety of 
techniques. For example, a DNA construct which includes exogenous synthetic DNA encoding 
a therapeutic protein and additional DNA sequences necessary for expression in recipient cells 
can be introduced into primary or secondary cells by electroporation, microinjection, or other 
means (e.g., calcium phosphate precipitation, modified calcium phosphate precipitation, 
polybrene precipitation, liposome fusion, receptor-mediated DNA delivery). Alternatively, a 
vector, such as a retroviral or other vector which includes exogenous synthetic DNA can be used 
and cells can be genetically modified as a result of infection with the vector. 

In addition to the exogenous synthetic DNA, transfected or infected primary and 
secondary cells may optionally contain DNA encoding a selectable marker, which is expressed 
and confers upon recipients a selectable phenotype, such as antibiotic resistance, resistance to a 
cytotoxic agent, nutritional prototrophy or expression of a surface protein. Its presence makes it 
possible to identify and select cells containing the exogenous DNA. A variety of selectable 
marker genes can be used, such as neo, gpt, dhfr, ada, pac, hyg, mdr and hisD. 

Transfected or infected cells of the present invention are useful, as populations of 
transfected or infected primary cells or secondary cells, transfected or infected clonal cell strains, 
transfected or infected heterogenous cell strains, and as cell mixtures in which at least one 
representative cell of one of the three preceding categories of transfected or infected cells is 
present, (e.g., the mixture of cells contains essentially transfected or infected primary or 
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secondary cells and may include untransfected or uninfected primary or secondary cells) as a 
delivery system for treating an individual with an abnormal or undesirable condition which 
responds to delivery of a therapeutic protein, which is either: 1) a therapeutic protein (e.g., a 
protein which is absent, underproduced relative to the individual's physiologic needs, defective, 
or inefficiently or inappropriately utilized in the individual, e.g., Factor VIII or Factor IX; or 2) a 
therapeutic protein with novel functions, such as enzymatic or transport functions such as a- 
galactosidase. In the method of the present invention of providing a therapeutic protein, 
transfected or infected primary cells or secondary cells, clonal cell strains or heterogenous cell 
strains, are administered to an individual in whom the abnormal or undesirable condition is to be 
treated or prevented, in sufficient quantity and by an appropriate route, to express the exogenous 
synthetic DNA at physiologically relevant levels. A physiologically relevant level is one which 
either approximates the level at which the product is produced in the body or results in 
improvement of the abnormal or undesirable condition. 

Clonal cell strains of transfected or infected secondary cells (referred to as transfected or 
infected clonal cell strains) expressing exogenous synthetic DNA (and, optionally, including a 
selectable marker gene) can be produced by the method of the present invention. The method 
includes the steps of: 1) providing a population of primary cells, obtained from the individual to 
whom the transfected or infected primary cells will be administered or from another source; 2) 
introducing into the primary cells or into secondary cells derived from primary cells a DNA 
construct which includes exogenous DNA as described above and the necessary additional DNA 
sequences described above, producing transfected or infected primary or secondary cells; 3) 
maintaining transfected or infected primary or secondary cells under conditions appropriate for 
their propagation; 4) identifying a transfected or infected primary or secondary cell; and 5) 
producing a colony from the transfected or infected primary or secondary cell identified in (4) by 
maintaining it under appropriate culture conditions until a desired number of cells is obtained. 
The desired number of clonal cells is a number sufficient to provide a therapeutically effective 
amount of product when administered to an individual, e.g., an individual with hemophilia A is 
provided with a population of cells that produce a therapeutically effective amount of Factor 
VIE, such that that the condition is treated. The individual can also be, for example, an 
individual with hemophilia B or an individual with a deficiency of a-galactosidase such as an 
individual with Fabry disease. The number of cells required for a given therapeutic dose 
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depends on several factors including the expression level of the protein, the condition of the 
host animal and the limitations associated with the implantation procedure. In general, the 
number of cells required for implantation is in the range of lxlO 6 to 5xl0 9 , and preferably 1x10 s 
to 5x1 0 g . In one embodiment of the method, the cell identified in (4) undergoes approximately 
27 doublings (i.e., undergoes 27 cycles of cell growth and cell division) to produce 100 million 
clonal transfected or infected cells. In another embodiment of the method, exogenous synthetic 
DNA is introduced into genomic D1SI A by homologous recombination between DNA sequences 
present in the DNA construct and genomic DNA. In another embodiment, the exogenous 
synthetic DNA is present episomally in a transfected cell, e.g., primary or secondary cell. 

In one embodiment of producing a clonal population of transfected secondary cells, a eel 
suspension containing primary or secondary cells is combined with exogenous synthetic DNA 

enendinp a theraneutic orotein and DNA encoding a selectable marker, such as the neo gene. 

- ----- ^ A j. — 

The two DNA sequences are present on the same DNA construct or on two separate DNA 
constructs. The resulting combination is subjected to electroporation, generally at 250-300 volts 
with a capacitance of 960 ^Farads and an appropriate time constant (e.g., 14 to 20 m sec) for 
cells to take up the DNA construct. In an alternative embodiment, microinjection is used to 
introduce the DNA construct into primary or secondary cells. In either embodiment, 
introduction of the exogenous DNA results in production of transfected primary or secondary 
cells. The exogenous synthetic DNA introduced into the cell can be stably integrated into 
genomic DNA or is present episomally in the cell. 

In the method of producing heterogenous cell strains of the present invention, the same 
steps are carried out as described for production of a clonal cell strain, except that a single 
transfected primary or secondary cell is not isolated and used as the founder cell. Instead, two oi 
more transfected primary or secondary cells are cultured to produce a heterogenous cell strain. / 
heterogenous cell strain can also contain in addition to two or more transfected primary or 
secondary cells, untransfected primary or secondary cells. 

The methods described herein have wide applicability in treating abnormal or undesircd 
conditions and can be used to provide a variety of proteins in an effective amount to an 
individual. For example, they can be used to provide secreted proteins (with either 
predominantly systemic or predominantly local effects, e.g., Factor VIII and Factor IX), 
membrane proteins (e.g., for imparting new or enhanced cellular responsiveness, facilitating 
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removal of a toxic product or for marking or targeting to a cell) or intracellular proteins (e.g., 
for affecting gene expression or producing autocrine effects). 

A method described herein is particularly advantageous in treating abnormal or undesired 
conditions in that it: 1) is curative (one gene therapy treatment has the potential to last a patient's 
lifetime); 2) allows precise dosing (the patient's cells continuously determine and deliver the 
optimal dose of the required protein based on physiologic demands, and the stably transfected or 
infected cell strains can be characterized extensively in vitro prior to implantation, leading to 
accurate predictions of long term function in vivo) ; 3) is simple to apply in treating patients; 4) 
eliminates issues concerning patient compliance (following a one-time gene therapy treatment, 
daily protein injections are no longer necessary); and 5) reduces treatment costs (since the thera- 
peutic protein is synthesized by the patient's own cells, investment in costly protein production 
and purification is unnecessary) 

As used herein, the term "optimized messenger RNA" refers to a synthetic nucleic acid 
sequence encoding a protein wherein at least one non-common codon or less-common codon in 
the sequence encoding the protein has been replaced with a common codon. 

By "common codon" is meant the most common codon representing a particular amino 
acid in a human sequence. The codon frequency in highly expressed human genes is outlined 
below in Table 1 . Common codons include: Ala (gcc); Arg (cgc); Asn (aac); Asp (gac); Cys 
(tgc); Gin (cag); Gly (ggc); His (cac); lie (ate); Leu (ctg); Lys (aag); Pro (ccc); Phe (ttc); Ser 
(age); Thr (acc); Tyr (tac); Glu (gag); and Val (gtg) (see Table 1). "Less-common codons" are 
codons that occurs frequently in humans but are not the common codon: Gly (ggg); He (att); Leu 
(etc); Ser (tec); Val (gtc); and Arg (agg). All codons other than common codons and less- 
common codons are "non-common codons". 
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TABLE 1: Codon Frequency in Highly Expressed Human Genes 
% occurrence % occurrence 

Ala Cys 

GC C 53 TG C 68 

T 17 T 32 

A 13 

G 17 Gin 

CA A 12 
Arg G 88 

CG C 37 

T 7 Glu 

G 21 G 75 

AG A 10 

G 18 Gly 

GG C 50 

Asn T 12 

AA C 78 A 14 

T 25 G 24 

Leu His 

CT C 26 CA C 79 

T 5 T 21 

A 3 

G 58 lie 

TT A 2 AT C 77 

G 6 T 18 

A 5 

Lys 

AA A 18 Ser 

G 82 TC C 28 
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Pro 

cc 


c 

T 
A 
G 


48 
19 
16 
17 


Phe 

TT 


C 
T 


80 
20 


AG 


T 
A 
G 
C 
T 


13 

5 
9 

34 
10 


Thr 
AC 


C 
T 
A 
G 


57 
14 
14 
15 


Tyr 

TA C 74 

T 26 


Val 

GT C 25 
T 7 
A 5 
G 64 

Codon frequency in Table 1 was calculated using the GCG program established by the 
University of Wisconsin Genetics Computer Group. Numbers represent the percentage of cases 
in which the particular codon is used. 

The term "primary cell" includes cells present in a suspension of cells isolated from a 
vertebrate tissue source (prior to their being plated i.e., attached to a tissue culture substrate such 
as a dish or flask), cells present in an explant derived from tissue, both of the previous types of 
cells plated for the first time, and cell suspensions derived from these plated cells. The term 
secondary cell or cell strain refers to cells at all subsequent steps in culturing. That is, the first 
time a plated primary cell is removed from the culture substrate and replated (passaged), it is 
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referred to herein as a secondary cell, as are all cells in subsequent passages. Secondary cells 
are cell strains which consist of secondary cells which have been passaged one or more times. A 
cell strain consists of secondary cells that: 1) have been passaged one or more times; 2) exhibit a 
finite number of mean population doublings in culture; 3) exhibit the properties of contact- 
inhibited, anchorage dependent growth (anchorage-dependence does not apply to cells that are 
propagated in suspension culture); and 4) are not immortalized. A "clonal cell strain" is defined 
as a cell strain that is derived from a single founder cell. A "heterogenous cell strain" is defined 
as a cell strain that is derived from two or more founder cells. 

The term "transfected cell" refers to a cell into which an exogenous synthetic nucleic acid 
sequence, e.g., a sequence which encodes a protein, is introduced. Once in the cell, the synthetic 
nucleic acid sequence can integrate into the recipients cells chromosomal DNA or can exist 
episomally. Standard transfection methods can be used to introduce the synthetic nucleic acid 
sequence into a cell, e.g., transfection mediated by liposome, polybrene, DEAE dextran- 
mediated transfection, electroporation, calcium phosphate precipitation or microinjection. The 
term "transfection" does not include delivery of DNA or RNA into a cell by a virus 
The term "infected cell" refers to a cell into which an exogenous synthetic nucleic acid sequence, 
e.g., a sequence which encodes a protein, is introduced by a virus. Viruses known to be useful 
for gene transfer include an adenovirus, an adeno-associated virus, a herpes virus, a mumps 
virus, a poliovirus, a retrovirus, a Sindbis virus, a lentivirus and a vaccinia virus such as a canary 
pox virus. Other features and advantages of the invention will be apparent from the following 
detailed description and the claims. 

Detailed Description of the Invention 

The drawings are first briefly described. 

Figure 7 is a schematic representation of domain structures of full-length and B-domain 
deleted human Factor VIII (hFVIII). 

Figure 2 is a schematic representation of full-length hFVTII. 

Figure 3 is a schematic representation of 5R BDD hFVIII expression plasmid pXF8. 1 86. 
Figure 4 is a schematic representation of LE BDD hFVIII expression plasmid pXF8.61. 
Figure 5 is a schematic representation of the fourteen fragments (Fragments A-Fragment 
N) assembled to construct pXF8.61 . 
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Figure 6 is a schematic representation of the assembly of pXF8.61. 

Figure 7 depicts the nucleotide sequence and the corresponding amino acid sequence of 
the LE B-domain-deleted-Factor VIII (FVIII) insert contained in pAMl-1 (SEQ ID NO:l). 

Figure 8 is a schematic representation of the fragments assembled to construct pXF8.1 86. 

Figure 9 depicts the nucleotide sequence and the corresponding amino acid sequence of 
the 5Arg B-domain-deleted-FVIII insert (SEQ ID NO:2). 

Figure JO is a schematic representation of the Factor VIII expression plasmid, pXF8.36. 
The cytomegalovirus immediate early I (CMV) promoter is depicted as a lightly shaded box. 
Positions of splice donor (SD) and splice acceptor (SA) sites are indicated below the shaded box. 
The Factor VIII cDNA sequence is depicted as a solid dark box. The hGH 3'UTS region is 
depicted as an open box. The new expression cassette is depicted as a shaded box with an 

+r\ tVi** rliiwtirm r\f trancrM-mfirm TVif tViin HnrV linp rpnrfiGfttltR thft 

plasmid backbone sequences. The position and direction of transcription of the p-lactamase gene 
(amp) is indicated by the solid boxed arrow. 

Figure 11 is a schematic representation of the Factor VHI expression plasmid, pXF8.38. 
The cytomegalovirus immediate early I (CMV) promoter is depicted as a lightly shaded box. 
Positions of splice donor (SD) and splice acceptor (SA) sites are indicated below the shaded box. 
The Factor VIII cDNA sequence is depicted as a solid dark box. The hGH 3'UTS region is 
depicted as an open box. The neo expression cassette is depicted as a shaded box with an 
arrowhead which corresponds to the direction of transcription. The thin dark line represents the 
plasmid backbone sequences. The position and direction of transcription of the P-lactamase gene 
(amp) is indicated by the solid boxed arrow. 

Figure 12 is a schematic representation of the Factor VIII expression plasmid, pXF8.269. 
The collagen (I) a 2 promoter is depicted as a striped box. The region representing aldolase- 
derived 5' untranslated sequences is depicted as a lightly shaded box. Positions of splice donor 
(SD) and splice acceptor (SA) sites are indicated below the shaded box. The Factor VIII cDNA 
sequence is depicted as a solid dark box. The hGH 3'UTS region is depicted as an open box. 
The neo expression cassette is depicted as a shaded box with an arrowhead which corresponds to 
the direction of transcription. The thin dark line represents the plasmid backbone sequences. 
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The position and direction of transcription of the p-lactamase gene (amp) is indicated by the 
solid boxed arrow. 

Figure 13 is a schematic representation of the Factor VIII expression plasmid, pXF8.224. 
The collagen (I) a 2 promoter is depicted as a striped box. The region representing aldolase- 
derived 5' untranslated sequences is depicted as a lightly shaded box. Positions of splice donor 
(SD) and splice acceptor (SA) sites are indicated below the shaded box. The Factor VTQ cDNA 
sequence is depicted as a solid dark box. The hGH 3 'UTS region is depicted as an open box. 
The neo expression cassette is depicted as a shaded box with an arrowhead which corresponds to 
the direction of transcription. The thin dark line represents the plasmid backbone sequences. 
The position and direction of transcription of the P-lactamase gene (amp) is indicated by the 
solid boxed arrow. 

FigUVB 14 IS a Schematic FCprCS Clita lIOG of the SaglliCIitS aSSCIIiblcd to CGnSiTuCt 

pFIXABCD. The restriction sites that are cut are in bold and the junctions from the last step are 
underlines. The direction of transcription of the FEXABCD sequence is indicated by the solid 
black arrow. 

Figure 15 depicts the nucleotide sequence of the FEXABCD insert (SEQ ID NO:105). 

Figure 16 is a schematic representation of the Factor IX expression plasmids pXIX76 and 
pXLK170. The arrows inside the circle denote open reading frames. Arrows on the circle denote 
promoter sequences; a double headed arrow denotes an enhancer. Thin lines denote bacterial 
vector sequences or introns and thick boxes delineate the translated sequence. Double lines 
denote untranscribed genomic sequences, while lines of intermediate thickness denote 
untranslated portions of the mRNA. Plasmid pXIX170 has a Factor IX cDNA sequence that is 
optimized, while pXDC76 does not. 

Figure 1 7 depicts the nucleotide sequence of the a-galactosidase insert SEQ ID NO: 106). 

Figure 18 is a schematic representation of the a-galactosidase expression plasmids 
pXAG94 and pXAG95. The arrows inside the circle denote open reading frames. Arrows on the 
circle denote promoter sequences; a double headed arrow denotes an enhancer. Thin lines 
denote bacterial vector sequences or introns and thick boxes delineate the translated sequence. 
Double lines denote untranscribed genomic sequences, while lines of intermediate thickness 
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denote untranslated portions of the mRNA. Plasmid pXAG95 has an a-galactosidase cDNA 
sequence that is optimized, while pXAG94 does not. 

Figure 79 is a schematic representation of the a-galactosidase expression plasmids 
pXAG73 and pXAG74. The arrows inside the circle denote open reading frames. Arrows on the 
circle denote promoter sequences; a double headed arrow denotes an enhancer. Thin lines 
denote bacterial vector sequences or introns and thick boxes delineate the translated sequence. 
Double lines denote untranscribcd genomic sequences, while lines of intermediate thickness 
denote untranslated portions of the mRNA. Plasmid pXAG74 has an a-galactosidase cDNA 
sequence that is optimized, while pXAG73 does not. 

Message Optimization 

Methods of the invention are directed to optimized messages ?*nd synthetic nucleic ar-id 
sequences which direct the production of optimized mRNAs. An optimized mRNA can direct 
the synthesis of a protein of interest, e.g., a human protein, e.g. a human Factor VIII, human 
Facto DC or human a-galactosidase. A message for a protein of interest, e.g., human Factor VIII, 
human Factor IX or human a-galactosidase, can be optimized as described herein, e.g., by 
replacing at least 94%, 95%, 96%, 97%, 98%, 99%, and preferably all of the non-common 
codons or less-common codons with a common codon encoding the same amino acid as outlined 
in Table 1. 

The coding region of a synthetic nucleic acid sequence can include the sequence "eg" 
without any discrimination, if the sequence is found in the common codon for that amino acid. 
Alternatively, the sequence "eg" can be limited in various regions, e.g., the first 20% of the 
coding sequence can be designed to have a low incidence of the sequence "eg". 

Optimizing a message (and its synthetic DNA sequence) can negatively or positively 
affect gene expression or protein production. For example, replacing a less-common codon with 
a more common codon may affect the half-life of the mRNA or alter its structure by introducing 
a secondary structure that interferes with translation of the message. It may therefore be 
necessary, in certain instances, to alter the optimized message. 

All or a portion of a message (or its gene) can be optimized. In some cases the desired 
modulation of expression is achieved by optimizing essentially the entire message. In other 
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cases, the desired modulation will be achieved by optimizing part but not all of the message or 
gene. 

The codon usage of any coding sequence can be adjusted to achieve a desired property, 
for example high levels of expression in a specific cell type. The starting point for such an 
optimization may be a coding sequence with 100% common codons, or a coding sequence which 
contains a mixture of common and non-common codons. 

Two or more candidate sequences that differ in their codon usage are generated and 
tested to determine if they possess the desired property. Candidate sequences may be evaluated 
initially by using a computer to search for the presence of regulatory elements, such as silencers 
or enhancers, and to search for the presence of regions of coding sequence which could be 
converted into such regulatory elements by an alteration in codon usage. Additional criteria may 
include enrichment for particular nucleotides, e.g., A, C, G or U, codon bias for a particular 
amino acid, or the presence or absence of particular mRNA secondary or tertiary structure. 
Adjustment to the candidate sequence can be made based on a number of such criteria. 

Promising candidate sequences are constructed and then evaluated experimentally. , 
Multiple candidates may be evaluated independently of each other, or the process can be 
iterative, either by using the most promising candidate as a new starting point, or by combining 
regions of two or more candidates to produce a novel hybrid. Further rounds of modification and 
evaluation can be included. 

Modifying the codon usage of a candidate sequence can result in the creation or 
destruction of either a positive or negative element. In general, a positive element refers to any 
element whose alteration or removal from the candidate sequence could result in a decrease in 
expression of the therapeutic protein, or whose creation could result in an increase in expression 
of a therapeutic protein. For example, a positive element can include an enhancer, a promoter, a 
downstream promoter element, a DNA binding site for a positive regulator (e.g., a transcriptional 
activator), or a sequence responsible for imparting or removing mRNA secondary or tertiary 
structure. A negative element refers to any element whose alteration or removal from the 
candidate sequence could result in an increase in expression of the therapeutic protein, or whose 
creation would result in a decrease in expression of the therapeutic protein. A negative element 
includes a silencer, a DNA binding site for a negative regulator (e.g., a transcriptional repressor), 
a transcriptional pause site, or a sequence that is responsible for imparting or removing mRNA 
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secondary or tertiary structure. In general, a negative element arises more frequently than a 
positive element. Thus, any change in codon usage that results in an increase in protein 
expression is more likely to have arisen from the destruction of a negative element rather than 
the creation of a positive element. In addition, alteration of the candidate sequence is more likely 
to destroy a positive element than create a positive element. In one embodiment, a candidate 
sequence is chosen and modified so as to increase the production of a therapeutic protein. The 
candidate sequence can be modified, e.g., by sequentially altering the codons or by randomly 
altering the codons in the candidate sequence. A modified candidate sequence is then evaluated 
by detennining the level of expression of the resulting therapeutic protein or by evaluating 
another parameter, e.g., a parameter correlated to the level of expression. A candidate sequence 
which produces an increased level of a therapeutic protein as compared to an unaltered candidate 

In another approach, one or a group of codons can be modified, e.g., without reference to 
protein or message structure and tested. Alternatively, one or more codons can be chosen, on a 
message-level property, e.g., location in a region of predetermined, e.g., high or low, GC or AU 
content, location in a region having a structure such as an enhancer or silencer, location in a 
region that can be modified to introduce a structure such as an enhancer or silencer, location in a 
region having, or predicted to have, secondary or tertiary structure, e.g., intra-chain pairing, 
inter-chain pairing, location in a region lacking, or predicted to lack, secondary or tertiary 
structure, e.g., intra-chain or inter-chain pairing. A particular modified region is chosen if it 
produces the desired result 

Methods which systematically generate candidate sequences are useful. For example, 
one or a group, e.g., a contiguous block of codons, at various positions of a synthetic nucleic acid 
sequence can be replaced with common codons (or with non common codons, if for example, the 
starting sequence has been optimized) and the resulting sequence evaluated. Candidates can be 
generated by optimizing (or de-optimizing) a given "window" of codons in the sequence to 
generate a first candidate, and then moving the window to a new position in the sequence, and 
optimizing (or de-optimizing) the codons in the new position under the window to provide a 
second candidate. Candidates can be evaluated by determining the level of expression they 
provide, or by evaluating another parameter, e.g., a parameter correlated to the level of 
expression. Some parameters can be evaluated by inspection or computationally, e.g., the 
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possession or lack thereof of high or low GC or AU content; a sequence element such as an 
enhancer or silencer; secondary or tertiary structure, e.g., intra-chain or inter-chain paring 

Thus, hybrid messages, i.e., messages having a region which is optimized and a region 
which is not optimized, can be evaluated to determine if they have a desired property. The 
evaluation can be effected by, e.g., synthesizing the candidate message or messages, and 
determining a property such as its level of expression. Such a determination can be made in a 
cell-free system or in a cell-based system. The generation and testing of one or more candidates 
can also be performed, by computational methods, e.g., on a computer. For example, a computer 
program can be used to generate a number of candidate messages and those messages analyzed 
by a computer program which predicts the existence of primary structure elements or secondary 
or tertiary structure. 

A candidate message can be generated by dividing a region into subnwjions and 
optimizing each subregion. An optimized subregion is then combined with a non-optimized 
subregion to produce a candidate. For example, a region is divided into three subregions, a, b 
and c, each of which is then optimized to provide optimized subregions a', b' and c\ The 
optimized subregions, a', b', and c* can then be combined with one or more of the non-optimized 
subregions, e.g., a, b and c. For example, ab'c could be formed and tested. Different 
combinations of optimized and non-optimized subregions can be generated. By evaluating a 
series of such hybrid candidate sequences, it is possible to analyze the effect of modification of 
different subregions and, e.g., to define the particular version of each subregion that contributes 
most to the desired property. A preferred candidate can include the versions of each subregion 
that performed best in a series of such experiments. 

An algorithm for creating an optimized candidate sequence is as follows: 

1 . Provide a message sequence (an entire message or a portion thereof). Go to step 2. 

2. Generate a novel candidate sequence by modifying the codon usage of a candidate 
sequence by using, the most promising candidate sequence previously identified, or 
by combining regions of two or more candidates previously identified to produce a 
novel hybrid. Go to step 3. 
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3. Evaluate the candidate sequence and determine if it has a predetermined property. If 
the candidate has the predetermined property, then proceed to step 4, otherwise 
proceed to step 2. 
4. Use the candidate sequence as an optimized message. 

Methods can include first optimizing a mammalian synthetic nucleic acid sequence which 
encodes a protein of interest or a portion thereof, e.g., human Factor VIII, human Factor IX, 
human a-galactosidase, etc. The synthetic nucleic acid sequence can be optimized such that 
94%, 95%, 96%, 97%, 98%, 99%, or all, of the codons of the synthetic DNA are replaced with 
common codons. The next step involves determining the amount of protein produced as a result 
of message optimization compared to the amount of protein produced using the wild type 
sequence. In instances where the amount of protein produced is not of the desired or expected 
level, it may be desirable to replace one or more of the common codons of the protein-coding 
region with a less-common codon or non-common codon. A mammalian optimized message 
which is re-engineered such that common codons are replaced with less-common or non- 
common mammalian codons, or common codons of other eukaryotic species can result in at least 
1%, 5%, 10%, 20% or more of the common codons being replaced. Re-engineering the 
optimized message can be done, for example, systematically by replacing a single common 
codon with a less-common or non-common codon. Alternatively, a block of 2, 4, 6, 10, 20, 40 or 
more codons may be replaced with a less-common or non-common codons. The level of protein 
produced by these "re-engineered optimized" messages determines which re-engineered 
optimized message is chosen. 

Another approach of optimizing a message for increased protein expression includes 
altering the specific nucleotide content of an optimized synthetic nucleic acid sequence. The 
synthetic nucleic acid sequence can be altered by increasing or decreasing specific nucleotide(s) 
content, e.g., G, C, A, T, GC or AT content of the sequence. Increasing or decreasing the 
specific nucleotide content of a synthetic nucleotide sequence can be done by substituting the 
nucleotide of interest with another nucleotide. For example, a sequence that has a large number 
of codons that have a high GC content, e.g., glycine (GGC), can be substituted with codons that 
have a less GC rich content, e.g., glycine (GGT) or an AT rich codon. Similarly, a sequence that 
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has a large number of codons that have a high AT content, can be substituted with codons that 
have a less AT rich content, e.g., a GC rich codon. Any region, or all, of a synthetic nucleic acid 
sequence can be altered in this manner, e.g., the 5'UTR (e.g., the promoter-proximal coding 
region), the coding region, the intron sequence, or the 3'UTR. Preferably, nucleotide 
substitutions in the coding region do not result in an alteration of the amino acid sequence of the 
expressed product. Preferably, the nucleotide content, e.g., GC or AT content, of a sequence is 
increased or reduced by 10%, 20%, 30%, 40% or more. 

The synthetic nucleic acid sequence can encode a mammalian, e.g., a human protein. 
The protein can be, e.g., one which is endogenously a human, or an engineered protein. 
Engineered proteins include proteins which differ from the native protein by one or more amino 
acid residues. Examples of such proteins include fragments, e.g., internal fragments or 

j j.- J „1 A.^,^^. ^-^i+^ittn rt*yA nrrttpmo Viat/inrr nnp nr mrvrfi amino acid replacements. 
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A sequence which encodes the protein can have one or more introns. The synthetic 
nucleic acid sequence can include introns, as they are found in the non-optimized sequence or 
can include introns from a non-related gene. In other embodiments the intronic sequences can be 
modified. For example, all or part of one or more introns present in the gene can be removed or 
introns not found in the sequence can be added In preferred embodiments, one or more entire 
introns present in the gene are not present in the synthetic nucleic acid. In another embodiment, 
all or part of an intron present in a gene is replaced by another sequence, e.g., an intronic 
sequence from another protein. 

The synthetic nucleic acid sequence can encode: any protein including a blood factor, 
e.g., blood clotting factor V, blood clotting factor VII, blood clotting factor VIII, blood clotting 
factor IX, blood clotting factor X, or blood clotting factor XIII; an interleukin, e.g., interleukin 1, 
interleukin 2, interleukin 3, interleukin 6, interleukin 11, or interleukin 12; erythropoietin; 
calcitonin; growth hormone; insulin; insulinotropin; insulin-like growth factors; parathyroid 
hormone; p-interferon; y-interferon; nerve growth factors; FSHp; tumor necrosis factor; 
glucagon; bone growth factor-2; bone growth factor-7 TSH-p; CSF-granulocyte; CSF- 
macrophage; CSF-granulocyte/macrophage; immunoglobulins; catalytic antibodies; protein 
kinase C; glucocerebrosidase; superoxide dismutase; tissue plasminogen activator; urokinase; 
antithrombin III; DNAse; a-galactosidasc; tyrosine hydroxylase; apolipoprotein E; 
apolipoprotein A-I; globins; low density lipoprotein receptor; IL-2 receptor; IL-2 antagonists; 
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alpha- 1 antitrypsin; immune response modifiers; soluble CD4; a protein expressed under 
disease conditions; and proteins encoded by viruses, e.g., proteins which are encoded by a virus 
(including a retrovirus) which are expressed in mammalian cells post-infection. 

In preferred embodiments, the synthetic nucleic acid sequence can express its protein, 
e.g., a eukaryotic e.g., mammalian, protein, at a level which is at least 1 10%, 150%, 200%, 
500%, 1,000%, 5,000% or even 10,000% of that expressed by nucleic acid sequence that has not 
been optimized. This comparison can be made, e.g., in an in vitro mammalian cell culture 
system wherein the non-optimized and optimized sequences are expressed under the same 
conditions (e.g., the same cell type, same culture conditions, same expression vector). 

Suitable cell culture systems for measuring expression of the synthetic nucleic acid 
sequence and corresponding non^ptimized nucleic acid sequence are known in the art (e.g., the 
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molecular biology reference books. Vectors suitable for expressing the synthetic and non- 
optimized nucleic acid sequences encoding the protein of interest are described below and in the 
standard reference books described below. Expression can be measured using an antibody 
specific for the protein of interest (e.g., ELISA). Such antibodies and measurement techniques 
are known to those skilled in the art. 

In a preferred embodiment the protein is a human protein. In more preferred 
embodiments, the protein is human Factor VIII and the protein is a B domain deleted human 
Factor VHL In another preferred embodiment the protein is B domain deleted human Factor 
VIH with a sequence which includes a recognition site for an intracellular protease of the 
PACE/furin class, such as X-ARG-X-X-ARG site, a short-peptide linker, e.g., a two peptide 
linker, e.g., a leucine-glutamic acid peptide linker (LE), or a three, or four peptide linker, inserted 
at the heavy-light chain junction (see Fig. 1). 

A large fraction of the codons in the human messages encoding Factor VIII and Factor IX 
are non-common codons or less common codons. Replacement of at least 98% of these codons 
with common codons will yield nucleic acid sequences capable of higher level expression in a 
cell culture. Preferably, all of the codons are replaced with common codons and such 
replacement results in at least a 2 to 5 fold, more preferably a 10 fold and most preferably a 20 
fold increase in expression when compared to an expression of the corresponding native 
sequence in the same expression system. 
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The synthetic nucleic acid sequences of the invention can be introduced into the cells of 
a living organism. The sequences can be introduced directly, e.g., via homologous 
recombination, or via a vector. For example, DNA constructs or vectors can be used to introduce 
a synthetic nucleic acid sequence into cells of a living organism for gene therapy. See, e.g., U.S. 
Patent No. 5,460,959; and co-pending U.S. applications USSN 08/334,797; USSN 08/23 1,439; 
USSN 08/334,455; and USSN 08/928,881 which are hereby expressly incorporated by reference 
in their entirety. 

Transfected or Infected Cells 

Primary and secondary cells to be transfected or infected can be obtained from a variety 
of tissues and include cell types which can be maintained and propagated in culture. For 
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keratinocytes, epithelial cells (e.g., mammary epithelial cells, intestinal epithelial cells), 
endothelial cells, glial cells, neural cells, a cell comprising a formed element of the blood (e.g., 
lymphocytes, bone marrow cells), muscle cells and precursors of these somatic cell types. 
Primary cells are preferably obtained from the individual to whom the transfected or infected 
primary or secondary cells are adniinistered. However, primary ceUs may be obtained from a 
donor (other than the recipient) of the same species or another species (e.g., mouse, rat, rabbit, 
cat, dog, pig, cow, bird, sheep, goat, horse). 

Primary or secondary cells of vertebrate, particularly mammalian, origin can be 
transfected or infected with exogenous synthetic DNA encoding a therapeutic protein and 
produce an encoded therapeutic protein stably and reproducibly, both in vitro and in vivo, over 
extended periods of time. In addition, the transfected or infected primary and secondary cells 
can express the encoded product in vivo at physiologically relevant levels, cells can be recovered 
after implantation and, upon reculturing, to grow and display their preimplantation properties. 

The transfected or infected primary or secondary cells may also include DNA encoding a 
selectable marker which confers a selectable phenotype upon them, facilitating their 
identification and isolation. Methods for producing transfected primary, secondary cells which 
stably express exogenous synthetic DNA, clonal cell strains and heterogenous cell strains of such 
transfected cells, methods of producing the clonal and heterogenous cell strains, and methods of 
treating or preventing an abnormal or undesirable condition through the use of populations of 
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transfected primary or secondary cells are part of the present invention. Primary and secondary 
cells which can be transfected or infected include fibroblasts, keratinocytes, epithelial cells (e.g., 
mammary epithelial cells, intestinal epithelial cells), endothelial cells, glial cells, neural cells, a 
cell comprising a formed element of the blood (e.g., a lymphocyte, a bone marrow cell), muscle 
cells and precursors of these somatic cell types. Primary cells are preferably obtained from the 
individual to whom the transfected or infected primary or secondary cells are administered. 
However, primary cells may be obtained from a donor (other than the recipient) of the same 
species or another species (e.g., mouse, rat, rabbit, cat, dog, pig, cow, bird, sheep, goat, horse). 
Transformed or immortalized cells can also be used e.g., a Bowes Melanoma cell (ATCC 
Accession No. CRL 9607), a Daudi cell (ATCC Accession No. CCL 213), a HeLa cell and a 
derivative of a HeLa cell (ATCC Accession Nos. CCL 2, CCL2. 1 , and CCL 2.2), a HL-60 cell 
( KTnn a^ochaii xt^ rn oacw a MT_1080 ( ATPP ArmRinn No C.CJ . 19.1V a Turkat cell 

(ATCC Accession No. TIB 152), a KB carcinoma cell (ATCC Accession No. CCL 17), a K-562 
leukemia cell (ATCC Accession No. CCL 243), a MCF-7 breast cancer cell (ATCC Accession 
No. BTH 22), a MOLT-4 cell (ATCC Accession No. 1582), a Namalwa cell (ATCC Accession 
No. CRL 1432), a Raji cell (ATCC Accession No. CCL 86), a RPMI 8226 cell (ATCC 
Accession No. CCL 155), aU-937 cell (ATCC Accession No. CRL 1593), WI-38VA13 sub line 
2R4 cells (ATCC Accession No. CLL 75.1), a CCRF-CEM cell (ATCC Accession No. CCL 
1 19) and a 2780AD ovarian carcinoma cell (Van Der Blick et al., Cancer Res. 48: 5927-5932, 
1988), as well as heterohybridoma cells produced by fusion of human cells and cells of another 
species. In another embodiment, the immortalized cell line can be a cell line other than a human 
cell line, e.g., a CHO cell line or a COS cell line. In a preferred embodiment, the cell is a non- 
transformed cell. In various preferred embodiments, the cell is a mammalian cell, e.g., a primary 
or secondary mammalian cell, e.g., a fibroblast, a hematopoietic stem cell, a myoblast, a 
keratinocyte, an epithelial cell, an endothelial cell, a glial cell, a neural cell, a cell comprising a 
formed element of the blood, a muscle cell and precursors of these somatic cells. In a most 
preferred embodiment, the cell is a secondary human fibroblast. 

Alternatively, DNA can be delivered into any of the cell types discussed above by a viral 
vector infection. Viruses known to be useful for gene transfer include adenoviruses, adeno- 
associated virus, herpes virus, mumps virus, poliovirus, retroviruses, Sindbis virus, and vaccinia 
virus such as canary pox virus. Use of viral vectors is well known in the art: see e.g., Robbins 
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and Ghizzani, Mol Med Today 1:410-417, 1995. A cell which has an exogenous DNA 
introduced into it by a viral vector is referred to as an "infected cell" 

The invention also includes the genetic manipulation of a cell which normally produces a 
therapeutic protein. In this instance, the cell is manipulated such that the endogenous sequence 
which encodes the therapeutic protein is replaced with an optimized coding sequence, e.g., by 
homologous recombination. 

Exogenous Synthetic DNA 

Exogenous synthetic DNA incorporated into primary or secondary cells by the present 
method can be a synthetic DNA which encodes a protein, or a portion thereof, useful to treat an 
existing condition or prevent it from occurring. 

Syntnenc jjin/v incorporate miu piunai_y ui ow,uuuaijr — ™ — ~ o — 

encoding an entire desired protein or a gene portion which encodes, for example, the active or 
functional portion(s) of the protein. The protein can be, for example, a hormone, a cytokine, an 
antigen, an antibody, an enzyme, a clotting factor, e.g., Factor VIII or Factor XI, a transport 
protein, a receptor, a regulatory protein, a structural protein, or a protein which does not occur in 
nature. The DNA can be produced, using genetic engineering techniques or synthetic processes. 
The DNA introduced into primary or secondary cells can encode one or more therapeutic 
proteins. After introduction into primary or secondary cells, the exogenous synthetic DNA is 
stably incorporated into the recipient cell's genome (along with the additional sequences present 
in the DNA construct used), from which it is expressed or otherwise functions. Alternatively, the 
exogenous synthetic DNA may exist episomally within the primary or secondary cells. 


Selectable Markers 

A variety of selectable markers can be incorporated into primary or secondary cells. Fo; 
example, a selectable marker which confers a selectable phenotype such as drug resistance, 
nutritional auxotrophy, resistance to a cytotoxic agent or expression of a surface protein, can be 
used. Selectable marker genes which can be used include neo, gpt, dhfr, ada, pac (puromycin), 
hyg and hisD. The selectable phenotype conferred makes it possible to identify and isolate 
recipient primary or secondary cells. 
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DNA Constructs 

DNA constructs, which include exogenous synthetic DNA and, optionally, DNA 
encoding a selectable marker, along with additional sequences necessary for expression of the 
exogenous synthetic DNA in recipient primary or secondary cells, are used to transfect primary 
or secondary cells in which the encoded protein is to be produced Alternatively, infectious 
vectors, such as retroviral, herpes, lentivirus, adenovirus, adenovirus-associated, mumps and 
poliovirus vectors, can be used for this purpose. 

A DNA construct which includes the exogenous synthetic DNA and additional 
sequences, such as sequences necessary for expression of the exogenous synthetic DNA, can be 
used. A DNA construct which includes DNA encoding a selectable marker, along with 
additional sequences, such a? a promoter, nnlyadenylation site and splice junctions, can be used 
to confer a selectable phenotype upon introduction into primary or secondary cells. The two 
DNA constructs are introduced into primary or secondary cells, using methods described herein. 
Alternatively, one DNA construct which includes exogenous synthetic DNA, a selectable marker 
gene and additional sequences (e.g., those necessary for expression of the exogenous synthetic 
DNA and for expression of the selectable marker gene) can be used. 

Transfection of Primary or Secondary Cells and Production of Clonal or Heterogenous Cell 
Strains 

Vertebrate tissue can be obtained by standard methods such as punch biopsy or other 
surgical methods of obtaining a tissue source of the primary cell type of interest For example, 
punch biopsy is used to obtain skin as a source of fibroblasts or keratinocytes. A mixture of 
primary cells is obtained from the tissue, using known methods, such as enzymatic digestion. If 
enzymatic digestion is used, enzymes such as collagenase, hyaluronidase, dispase, pronase, 
trypsin, elastase and chymo trypsin can be used. 

The resulting primary cell mixture can be transfected directly or it can be cultured first, 
removed from the culture plate and resuspended before transfection is carried out. Primary cells 
or secondary cells are combined with exogenous synthetic DNA to be stably integrated into their 
genomes and, optionally, DNA encoding a selectable marker, and treated in order to accomplish 
transfection. The exogenous synthetic DNA and selectable marker-encoding DNA are each on a 
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separate construct or on a single construct and an appropriate quantity of DNA to ensure that at 
least one stably transfected cell containing and appropriately expressing exogenous DNA is pro- 
duced. In general, 0.1 to 500 ug DNA is used. 

Primary or secondary cells can be transfected by electroporation. Electroporation is 
carried out at appropriate voltage and capacitance (and time constant) to result in entry of the 
DNA construct(s) into the primary or secondary cells. Electroporation can be carried out over a 
wide range of voltages (e.g., 50 to 2000 volts) and capacitance values (e.g., 60-300 ^Farads). 
Total DNA of approximately 0.1 to 500 ug is generally used. 

Primary or secondary cells can be transfected using microinjection. Alternatively, known 
methods such as calcium phosphate precipitation, modified calcium phosphate precipitation and 
polybrene precipitation, liposome fusion and receptor-mediated gene delivery can be used to 
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culturing conditions and for sufficient time, to propagate the stably transfected secondary cells 
and produce a clonal cell strain of transfected secondary cells. Alternatively, more than one 
transfected cell is cultured and subcultured, resulting in production of a heterogenous cell strain. 

Transfected primary or secondary cells undergo a sufficient number of doublings to 
produce either a clonal cell strain or a heterogenous cell strain of sufficient size to provide the 
therapeutic protein to an individual in effective amounts. In general, for example, 0.1 cm 2 of 
skin is biopsied and assumed to contain 100,000 cells; one cell is used to produce a clonal cell 
strain and undergoes approximately 27 doublings to produce 100 million transfected secondary 
cells. If a heterogenous cell strain is to be produced from an original transfected population of 
approximately 100,000 cells, only 10 doublings are needed to produce 100 million transfected 
cells. 

The number of required cells in a transfected clonal or heterogenous cell strain is variable 
and depends on a variety of factors, including but not limited to, the use of the transfected cells, 
the functional level of the exogenous DNA in the transfected cells, the site of implantation of the 
transfected cells (for example, the number of cells that can be used is limited by the anatomical 
site of implantation), and the age, surface area, and clinical condition of the patient. To put these 
factors in perspective, to deliver therapeutic levels of human growth hormone in an otherwise 
healthy 10 kg patient with isolated growth hormone deficiency, approximately one to five 
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hundred million transfected fibroblasts would be necessary (the volume of these cells is about 
that of the very tip of the patient's thumb). 

Episomal Expression of Exogenous Synthetic DNA 

DNA sequences that are present within the cell yet do not integrate into the genome are 
referred to as episomes. Recombinant episomes may be useful in at least three settings: 1) if a 
given cell type is incapable of stably integrating the exogenous synthetic DNA; 2) if a given cell 
type is adversely affected by the integration of synthetic DNA; and 3) if a given cell type is 
capable of improved therapeutic function with an episomal rather than integrated synthetic DNA. 

Using transfection and culturing as described herein, exogenous synthetic DNA in the 
form of episomes can be introduced into vertebrate primary and secondary cells. Plasmids can 
be converted into such an episome by the addition DNA sequences for the Epstein-Rarr virus 
origin of replication and nuclear antigen (Y ates, J.L. Nature 319:780-7883 (1985)). 
Alternatively, vertebrate autonomously replicating sequences can be introduced into the 
construct (Weidle, U.H. Gene 12>{iyA21A31 (1988). These and other episomally derived 
sequences can also be included in DNA constructs without selectable markers, such as pXGH5 
(Selden et aL, Mol Cell Biol 6:3173-3179, 1986). The episomal synthetic exogenous DNA is 
then introduced into primary or secondary vertebrate cells as described in this application (if a 
selective marker is included in the episome a selective agent is used to treat the transfected cells). 

Implantation of Clonal Cell Strain or Heterogenous Cell Strain of Transfected Secondary Cells 
The transfected or infected cells produced as described above can be introduced into an 
individual to whom the therapeutic protein is to be delivered, using known methods. The clonal 
cell strain or heterogenous cell strain is then introduced into an individual, using known methods, 
using various routes of administration and at various sites (e.g., renal subcapsular, subcutaneous, 
central nervous system (including intrathecal), intravascular, intrahepatic, intrasplanchnic, 
intraperitoneal (including intraomental, or intramuscular implantation). In a preferred 
embodiment, the clonal cell strain or heterogeneous cell strain is introduced into the omentum. 
The omentum is a membranous structure containing a sheet of fat. Usually, the omentum is a 
fold of peritoneum extending from the stomach to adjacent abdominal organs. The greater 
omentim is attached to the inferior edge of the stomach and hangs down in front of the intestines. 
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The other edge is attached to the transverse colon. The lesser omentum is attached to the 
superior edge of the stomach and extends to the undersurface of the liver. The cells may be 
introduced into any part of the omentum by surgical implantation, laparoscopy or direct 
injection, e.g., via CT-guided needle or ultrasound. Once implanted in the individual, the cells 
produce the therapeutic product encoded by the exogenous synthetic DNA or are affected by the 
exogenous synthetic DNA itself. For example, an individual who has been diagnosed with 
Hemophilia A, a bleeding disorder that is caused by a deficiency in Factor VIII, a protein 
normally found in the blood, is a candidate for a gene therapy treatment. In another example, an 
individual who has been diagnosed with Hemophilia B, a bleeding disorder that is caused by a 
deficiency in Factor DC, a protein normally found in the blood, is a candidate for a gene therapy 
treatment. The patient has a small skin biopsy performed. This is a simple procedure which can 
oc periormeu on an uui-paucin uaMs. iuc pico^ ui oa-ui, appiwAxnAatwijr l±±^ t* iu« W u uvu^, 

is taken, for example, from under the arm and requires about one minute to remove. The sample 
is processed, resulting in isolation of the patient's cells and genetically engineered to produce the 
missing Factor IX or Factor VHL Based on the age, weight, and clinical condition of the patient, 
the required number of cells are grown in large-scale culture. The entire process requires 4-6 
weeks and, at the end of that time, the appropriate number, e.g., approximately 100-500 million 
genetically engineered cells are introduced into the individual, once again as an outpatient (e.g., 
by injecting them back under the patient's skin). The patient is now capable of producing his or 
her own Factor DC or Factor WI and is no longer a hemophiliac. 

A similar approach can be used to treat other conditions or diseases. For example, short 
stature can be treated by administering human growth hormone to an individual by implanting 
primary or secondary cells which express human growth hormone; anemia can be treated by 
administering erythropoietin (EPO) to an individual by implanting primary or secondary cells 
which express EPO; or diabetes can be treated by administering glucogen-like peptide- 1 (GLP-1) 
to an individual by implanting primary or secondary cells which express GLP-1 . A lysosomal 
storage disease (LSD) can be treated by this approach. LSD's represent a group of at least 41 
distinct genetic diseases, each one representing a deficiency of a particular protein that is 
involved in lysosomal biogenesis. A particular LSD can be treated by administering a lysosomal 
enzyme to an individual by implanting primary or secondary cells which express the lysosomal 
enzyme, e.g., Fabry Disease can be treated by administering a-galactosidasc to an individual by 
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implanting primary or secondary cells which express a-galactosidase; Gaucher disease can be 
treated by administering {3-glucoceramidase to an individual by implanting primary or secondary 
cells which express P~glucoceramidase; MPS (mucopolysaccharidosis) type 1 (Hurley-Scheie 
syndrome) can be treated by administering a-iduronidase to an individual by implanting primary 
or secondary cells which express a-iduronidase; MPS type II (Hunter syndrome) can be treated 
by administering a-L-iduronidase to an individual by implanting primary or secondary cells 
which express a-L-iduronidase; MPS type III-A (Sanfilipo A syndrome) can be treated by 
administering glucosamine-N-sulfatase to an individual by implanting primary or secondary cells 
which express glucosamine-N-sulfatase; MPS type III-B (Sanfilipo B syndrome) can be treated 
by administering alpha-N-acetylglucosaminidase to an individual by implanting primary or 
secondary cells which express alpha-N-acetylglucosaminidase; MPS type IH-C (Sanfilipo C 
syndrome) can be treated by administering acetylcoenzyme A:a-glucosmainide-N- 
acetyltransferase to an individual by implanting primary or secondary cells which express 
acetylcoenzyme A:a-glucosmainide-N-acetyltransferase; MPS type 1 1 1-D (Sanfilippo D 
syndrome) can be treated by administering N-acetylglucosamine-6-sulfatase to an individual by 
implanting primary or secondary cells which express N-acetylglucosamine-6-sulfatase; MPS 
type IV-A (Morquip A syndrome) can be treated by administering N-Acetylglucosamine-6- 
sulfatase to an individual by implanting primary or secondary cells which express N- 
acetylglucosamine-6-sulfatase; MPS type IV-B (Morquio B syndrome) can be treated by 
administering (3-galactosidase to an individual by implanting primary or secondary cells which 
express (3-galactosidase; MPS type VI (Maroteaux-Larry syndrome) can be treated by 
administering N-acetylgalactosamine-6-sulfatase to an individual by implanting primary or 
secondary cells which express N-acetylgalactosamine-6-sulfatase; MPS type VII (Sly syndrome) 
can be treated by administering P-glucuronidase to an individual by implanting primary or 
secondary cells which express P-glucuronidase. 

The cells used for implantation will generally be patient-specific genetically engineered 
cells. It is possible, however, to obtain cells from another individual of the same species or from 
a different species. Use of such cells might require administration of an immunosuppressant, 
alteration of histocompatibility antigens, or use of a barrier device to prevent rejection of the 
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implanted cells. For many diseases, this will be a one-time treatment and, for others, multiple 
gene therapy treatments will be required. 

Uses of Transfected or Infected Primary and Secondary Cells and Cell Strains 
Transfected or infected primary or secondary cells or cell strains have wide applicability 
as a vehicle or delivery system for therapeutic proteins, such as enzymes, hormones, cytokines, 
antigens, antibodies, clotting factors, anti-sense RNA, regulatory proteins, transcription proteins, 
receptors, structural proteins, novel (non-optimized) proteins and nucleic acid products, and 
engineered DNA. For example, transfected primary or secondary cells can be used to supply a 
therapeutic protein, including, but not limited to, Factor VIII, Factor IX, erythropoietin, alpha- 1 
antitrypsin, calcitonin, glucocerebrosidase, growth hormone, low density lipoprotein (LDL), 
receptor IL-2 receptor and its antagonists, insulin, globin. immunoglobulins, catalytic antibodies, 
the interleukins, insulin-like growth factors, superoxide dismutase, immune responder modifiers, 
parathyroid hormone and interferon, nerve growth factors, tissue plasminogen activators, and 
colony stimulating factors. Alternatively, transfected primary and secondary cells can be used to 
immunize an individual (i.e., as a vaccine). 

The wide variety of uses of cell strains of the present invention can perhaps most 
conveniently be summarized as shown below. The cell strains can be used to deliver the 
following therapeutic products. 


1. 

a secreted protein with predominantly systemic effects; 

2. 

a secreted protein with predominantly local effects; 

3. 

a membrane protein imparting new or enhanced cellular responsiveness; 

4. 

membrane protein facilitating removal of a toxic product; 

5. 

a membrane protein marking or targeting a cell; 

6. 

an intracellular protein; 

7. 

an intracellular protein directly affecting gene expression; and 

8. 

an intracellular protein with autocrine effects. 


Transfected or infected primary or secondary cells can be used to administer therapeutic 
proteins (e.g., hormones, enzymes, clotting factors) which are presently administered 
intravenously, intramuscularly or subcutaneously, which requires patient cooperation and, often, 

41 


WO 02/064799 


PCT/US01/42655 


medical staff participation. When transfected or infected primary or secondary cells are used, 
there is no need for extensive purification of the polypeptide before it is administered to an 
individual, as is generally necessary with an isolated polypeptide. In addition, transfected or 
infected primary or secondary cells of the present invention produce the therapeutic protein as it 
would normally be produced. 

An advantage to the use of transfected or infected primary or secondary cells is that by 
controlling the number of cells introduced into an individual, one can control the amount of the 
protein delivered to the body. In addition, in some cases, it is possible to remove the transfected 
or infected cells if there is no longer a need for the product. A further advantage of treatment by 
use of transfected or infected primary or secondary cells of the present invention is that 
production of the therapeutic product can be regulated, such as through the administration of 
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or affects the stability of a nucleic acid product. 
Transgenic animals 

A number of methods have been used to obtain transgenic, non-human mammals. A 
transgenic non-human mammal refers to a mammal that has gained an additional gene through 
the introduction of an exogenous synthetic nucleic acid sequence, i.e., transgene, into its own 
cells (e.g., both the somatic and germ cells), or into an ancestor's germ line. 

There are a number of methods to introduce the exogenous DNA into the germ line (e.g., 
introduction into the germ or somatic cells) of a mammal. One method is by microinjection of a 
the gene construct into the pronucleus of an early stage embryo (e.g., before the four-cell stage) 
(Wagner et al., Proc. Natl Acad. ScL USA 78:5016 (1981); Brinster et al, Proc Natl Acad Sci 
USA 82:4438 (1985)). The detailed procedure to produce such transgenic mice has been 
described (see e.g., Hogan et al., Manipulating the Mouse Embryo, Cold Spring Harbour 
Laboratory, Cold Spring Harbour, NY (1986); US Patent No. 5,175,383 (1992)). This procedure 
has also been adapted for other mammalian species (e.g., Hammer et al., Nature 315:680 (1985); 
Murray et al., Reprod. Fert Devi 1:147 (1989); Pursel et al., Vet Immunol Histopath. 17:303 

(1987) ; Rexroad et al., J. Reprod, FerL 41(suppl):l 19 (1990); Rexroad et al., Molec. Reprod. 
Devi 1:164 (1989); Simons et al, BioTechnology 6:179 (1988); Vize et al., J. Cell Set 90:295 

(1988) ; and Wagner, J. Cell Biochem. 13B(suppl): 164 (1989). 
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Another method for producing germ-line transgenic mammals is through the use of 
embryonic stem cells or somatic cells (e.g., embryonic, fetal or adult). The gene construct may 
be introduced into embryonic stem cells by homologous recombination (Thomas et al., Cell 
51:503 (1987); Capecchi, Science 244: 1288 (1989); Joyner et al, Nature 338: 153 (1989)). A 
suitable construct may also be introduced into the embryonic stem cells by DNA-mediated 
transfection, such as electroporation (Ausubel et al., Current Protocols in Molecular Biology, 
John Wiley & Sons (1987)). Detailed procedures for culturing embryonic stem cells (e.g. ESD- 
3, ATCC# CCL-1934, ES-E14TG-2a, ATCC# CCL-1821, American Type Culture Collection, 
Rockville, MD) and the methods of making transgenic mammals from embryonic stem cells can 
be found in Teratocarcinomas and Embryonic Stem Cells, A Practical Approach, ed. E.J. 
Robertson (IRL Press, 1987). Methods of making transgenic animals from somatic cells can be 
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In the above methods for the generation of a germ-line transgenic mammals, the construct 
may be introduced as a linear construct, as a circular plasmid, or as a vector which may be 
incorporated and inherited as a transgene integrated into the host genome. The transgene may 
also be constructed so as to permit it to be inherited as an extrachromosomal plasmid (Gassmann, 
M. et al., Proc. Natl Acad, Sci. USA 92:1292 (1995)). 


Human Factor VIII 

hFVIII is encoded by a 186 kilobase (kb) gene, with the coding region distributed among 
26 exons (Gitchier et al., Nature, 312:326-330, (1984)). Transcription of the gene and splicing 
of the resulting primary transcript results in an mRNA of approximately 9 kb which encodes a 
primary translation product containing 2351 amino acids (aa), including a 19 aa signal peptide. 
Excluding the signal peptide, the 2332 aa protein has a domain structure which can be 
represented as NH2-A1-A2-B-A3-C1-C2-COOH, with a predicted molecular mass of 265 
kilodaltons (kD). Glycosylation of this protein results in a product with a molecular mass of 
approximately 330 kD as determined by SDS-PAGE. In plasma, hFVIII is a heterodimeric 
protein consisting of a heavy chain that ranges in size from 90 kD to 200 kD in a metal ion 
complex with an 80 kD light chain. The heterodimeric complex is further stabilized by 
interactions with vWF. The heavy chain is comprised of domains A1-A2-B and the light chain is 
comprised of domains A3-C1-C2 (Figure 2). Protease cleavage sites in the B-domain account 

43 


WO (12/064799 


PCT/US01/42655 


for the size variation of the heavy chain, with the 90 kD species containing no B-domain 
sequences and the 200 kD species containing a complete or nearly complete B-domain. The B- 
domain has no known function and it is fully removed upon hFVIII activation by thrombin. 

Human Factor VIII expression plasmids, plasmids pXF8.186 (Figure 3), pXF8.61 (Figure 
4), pXF8.38 (Fig. 1 1) and pXF8.224 (Fig. 13) are described below. The hFVIH expression 
construct plasmid pXF8. 1 86, was developed based on detailed optimization studies which 
resulted in high level expression of a functional hFVIII. Given the extremely large size of the 
hFVIII gene and the need to transfer the entire coding region into cells, cDNA expression 
plasmids were developed for the production of stably transfected clonal cell strains. It has 
proven difficult to achieve high level expression of hFVIII using the wild-type 9 kb cDNA. 
Three potential reasons for the poor expression are as follows. First, the wild-type cDNA 
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chain and has no known function (Figure 1). Removal of the region encoding the B-domain 
from hFVIII expression constructs leads to greatly improved expression of a functional protein. 
Analysis of hFVIII derivatives lacking the B-domain has demonstrated that hFVIII function is 
not adversely affected and that such molecules have biochemical, immunologic, and in vivo 
functional properties which are very similar to the wild-type protein. Two different BDD hFVIII 
expression constructs have been developed, which encode proteins with different amino acid 
sequences flanking the deletion. Plasmid pXF8.1 86 contains a complete deletion of the B- 
domain (amino acids 741-1648 of the wild-type mature protein sequence), with the sequence 
Arg-Arg-Arg-Arg (RRRR) inserted at the heavy chain-light chain junction (Figure 1). This 
results in a string of five consecutive arginine residues (RRRRR or 5R) at the heavy chain-light 
chain junction, which comprises a recognition site for an intracellular protease of the PACE/furin 
class, and was predicted to promote cleavage to produce the correct heavy and light chains. 
Plasmid pXF8.61 also contains a complete deletion of the B-domain with a synthetic Xhol site at 
the junction. This linker results in the presence of the dipeptide sequence Leu-Glu (LE) at the 
heavy chain-light chain junction in the two forms of BDD hFVIII, the expressed proteins are 
referred to herein as 5R and LE BDD hFVIII. 

The second feature which has been reported to adversely affect hFVIII expression in 
transfected cells relates to the observation that one or more regions of the coding region have 
been identified which effectively function to block transcription of the cDNA sequence. The 

44 


WO 02/064 799 


PCT/U S0 1/42655 


inventors have now discovered that the negative influence of the sequence elements can be 
reduced or eliminated by altering the entire coding sequence. To this end, a completely synthetic 
B-domain deleted hFVIII cDNA was prepared as described in greater detail below. Silent base 
changes were made in all codons which did not correspond to the triplet sequence most 
frequently found for that amino acid in highly expressed human proteins, and such codons were 
converted to the codon sequence most frequently found in humans for the corresponding amino 
acid. The resulting coding sequence has a total of 1094 of 4335 base pairs which differ from the 
wild-type sequence, yet it encodes a protein with the wild-type hFVIII sequence (with the 
exception of the deletion of the B-domain). 25.2% of the bases were changed, and the GC 
content of the sequence increased from 44% to 64%. This sequence-altered BDD hFVIII cDNA 
is expressed at least 5.3-fold more efficiently than a non-altered control construct. 

The third feature which was optimized to improve hFVIII expression was the intron-exon 
structure of the expression construct. The cDNA is, by definition, devoid of introns. While this 
reduces the size of the expression construct, it has been shown that introns can have strong 
positive effects on gene expression when added to cDNA expression constructs. The 5' 
untranslated region of the human beta-actin gene, which contains a complete, functional intron 
was incorporated into the BDD hFVIII expression constructs pXF8.61 and pXF8.186. 

The fourth feature which can adversely affect hFVIII expression is the stability of the 
Factor VIII mRNA. The stability of the message can affect the steady-state level of the Factor 
Vm mRNA, and influence gene expression. Specific sequences within Factor VIII can be 
altered so as to increase the stability of the mRNA, e.g., the removal of AURE from the 3' UTR 
can result in a more stable Factor VIII mRNA. The data presented below show that coding 
sequence re-engineering has general utility for the improvement of expression of mammalian and 
non-mammalian eukaryotic genes in mammalian cells. The results obtained here with human 
Factor VIII suggest that systemic codon optimization (with disregard to CpG content) provides a 
fruitful strategy for improving the expression in mammalian cells of a wide variety of eukaryotic 
genes. 
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Methods of Making Synthetic Nucleotide Sequences 

A synthetic nucleic acid sequence which directs the synthesis of an optimized message of 
the invention can be made, e.g., by any of the methods described herein. The methods described 
below are advantageous for making optimized messages for the following reasons: 

1) they allow for production of a highly optimized protein, e.g., a protein having at 
least 94 to 100% of codons as common codons, especially for proteins larger than 90 amino 
acids in length. The final product can be 100% optimized, i.e., every single nucleotide is as 
chosen, without the need to introduce undesirable alterations every 100 - 300 bp. A gene can be 
synthesized with 100% optimized codons, or it can be synthesized with 100% the codons that are 
desired. Additional DNA sequence elements can be introduced or avoided without any 
limitations imposed by the need to introduce restriction enzyme sites. Such sequence elements 

i J : 

UUU1U lllUlUUt>. 

- Transcriptional signals, such as enhancers or silencers. 

- Splicing signals, for example avoiding cryptic splice sites in a cDNA, or optimizing the splice 
site context in an intron-containing gene. Adding an intron to a cDNA may aid expression and 
allows the introduction of transcriptional signals within the gene. 

- Instability signals - the creation or avoidance of sequences that direct mRNA breakdown. 

- Secondary structure - the creation or avoidance of secondary structures in the mRNA that may 
affect mRNA stability, transcriptional termination, or translation. 

- Translation^ signals - Codon choice. A gene can be synthesized with 100% optimal codons, or 
the codon bias for any amino acid can be altered without restriction to make gene expression 
sensitive to the concentration of an amino-acyl-tRNA, whose concentration may vary with 
growth or metabolic conditions. 

In each case, the goal may be to increase or decrease expression to bring expression 
under a particular form of regulation. 

2) they improve accuracy of the synthetic sequence because they avoid PCR 
amplification which introduces errors into the amplified sequence; and 

3) they reduce the cost of making the synthetic sequence of the invention. 
The synthetic nucleic acid sequence which directs the synthesis of the optimized 

messages of the invention can be prepared, e.g., by using the strategy which is outlined in greater 
detail below. 
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Strategy for building a sequence 

The initial step is to devise a cloning protocol. 

A sequence file containing 100% the desired DNA sequence is generated. This sequence 
is analyzed for restriction sites, including fusion sites. 
Fusion sites are, in order of preference: 

A) Sequences resulting from the ligation of two complementary overhangs normally generated 
by available restriction enzymes, e.g., 

SalI/XhoI= G A TCGAG 
CAGCT A C 

or DSpUl/nSLDl — ni \^vjr\rv 

TAGC A TT 

or BstBI/AccI = TT A CGAC 
AAGC A TG. 

B) Sequences resulting from the ligation of two overhangs generated by partially filling-in the 
overhangs of available restriction enzymes, e.g., 

Xhol(4TC)/Barnffl(+GA) = CTC^ATCC. 

GAGCT A AGG 

C) Sequences resulting from the blunt ligation of two blunt ends normally generated by available 
restriction enzymes, e.g., 

Ehel/Smal= GGC^GG 
CCG A CCC. 

D) Sequences resulting from the blunt ligation of two blunt ends, where one or both blunt ends 
have been generated by filling in an overhang, e.g., 

Bamffl(+GATC)/SmaI - GGATC A GGG 

CCTAC^CCC 
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The filling-in of a 5' overhang generated by a restriction enzyme is performed using a 
DNA polymerase, for example the Klenow fragment of DNA Polymerase I. If the overhang is to 
be filled in completely, then all four nucleotides, dATP, dCTP, dGTP, and dTTP, are included in 
the reaction.. If the overhang is to be only partially filled in, then the requisite nucleotides are 
omitted from the reaction, In item (B) above, the Xhol-digested DNA would be filled in by 
Klenow in the presence of dCTP and dTTP and by omitting dATP and dGTP. An order of 
cloning steps is determined that allows the use of sites about 1 50-500 bp apart. Note that a 
fragment must lack the recognition sequence for an enzyme, only if that enzyme is used to clone 
the fragment. For example, the strategy for the construction of the "desired" Factor VIII coding 
sequence can use ApaLI in a number of different places, because of the order of assembly of the 
fragments - ApaLI is not used in any of the later cloning steps. 

fill oitpc arp avail ahlp thftn a senuence-indenendent 

jj_ ILLlslV lO O- IV^iVll Mliwiv aav ksm.* 7 1 a 

strategy can be used: fragments are cloned into a DNA construct that contain recognition 
sequences for restriction enzymes that cleave outside of their recognition sequence, e.g., BseRI = 

GAGG AGNNNNNNNNN N A (SEQ ID NO:5) 

CTCCT(^n^^M A NN (SEQ ID NO:6) 

DNA construct cloning site gene fragment 

The recognition sequence of the enzyme used to clone the fragment will be removed 
when the fragment is released by digestion with, e.g. BseRI, leaving a fragment consisting of 
100% of the desired sequence, which can then be ligated to a similarly generated adjacent gene 
fragment. 

The next step is to synthesize initial restriction fragments. 

The synthesis of the initial restriction fragments can be achieved in a number of ways, 
including, but not limited to: 

1 . Chemical synthesis of the entire fragment. 

2. Synthesize two oligonucleotides that are complementary at their 3' ends, anneal them, 
and use DNA polymerase Klenow fragment, or equivalent, to extend, giving a double-stranded 
fragment. 

3. Synthesize a number of smaller oligonucleotides, kinase those oligos that have 
internal 5' ends, anneal all oligos and ligate, viz. 
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5' p P 3 

r p p ? 

Techniques 2 and 3 can be used in subsequent steps to join smaller fragments to each 
other. PCR can be used to increase the quantity of material for cloning, but it may lead to an 
increase in the number of mutations. If an error-free fragment is not obtained, then site-directed 
mutagenesis can be used to correct the best isolate. This is followed by concatenation of error- 
free fragments and sequencing of junctions to confirm their precision. 

Use 

The synthetic nucleic acid sequences of the invention are useful for expressing a protein 

A* a fnr r.nmtne,rcial nroduction of 

IlUIIlia.liy CAplCSSCU. m a liiajamiaiAtAAA v^n, i-ii v--o' — 4 

human proteins such as GH, tPA, GLP-1, EPO, a-galactosidase, P-glucoceramidase, a- 
iduronidase; a-L-iduronidase, glucosamine-N-sulfatase, alpha-N-acetylglucosaminidase, 
acetylcoenzyme A:a-glucosmainide-N-acetyltransferase, N-acetylglucosamine-6-sulfatase, N- 
acetylglucosamine-6-sulfatase, {3-galactosidase, N-acetylgalactosamine-6-sulfatase, |J- 
glucuronidase. Factor VIII, and Factor IX). The synthetic nucleic acid sequences of the 
invention are also useful for gene therapy. For example, a synthetic nucleic acid sequence 
encoding a selected protein can be introduced directly, e.g., via non-viral cell transfection or via 
a vector in to a cell, e.g., a transformed or a non-transformed cell, which can express the protein 
to create a cell which can be administered to a patient in need of the protein. Such cell-based 
gene therapy techniques are described in greater detail in co-pending US applications: USSN 
08/334,797; USSN 08/231,439; USSN 08/334,455; and USSN 08/928,881, which are hereby 
expressly incorporated by reference in their entirety. 
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Examples 

I. Factor VIII Constructs and Uses thereof 
Construction of pXF8.61 

The fourteen gene fragments of the B-domain-deleted-FVIII optimized cDNA listed in 
Table 2 and shown in Figure 5 (Fragment A-Fragment N) were made as follows. 92 
oligonucleotides were made by oligonucleotide synthesis on an ABI 391 synthesizer (Perkin 
Elmer). The 92 oligonucleotides are listed in Table 3. Figure 5 shows how these 92 
oligonucleotides anneal to form the fourteen gene fragments of Table 2. For each strand of each 
gene fragment, the first oligonucleotide (i.e. the most 5*) was manufactured with a 5^-hydroxyl 
terminus, and the subsequent oligonucleotides were manufactured as 5 x -phosphorylated to allow 
the ligation of adjacent annealed oligonucleotides. For gene fragments A, B, C, F, G, J, K, L, M 

. j\t -i* i — ij j „ — . — ~ i;<r»n+«.^ Airw**a+oA <iiri+tt Pr*rJ?T qtiH HinHTTT anH rilnnfvl 

CUJU IN, MA UllgUllUL-ltAJ UUt/O WOlt OliUUtUWl, ±1£1*».va_i, uigvuiwu a±ax ^w*** — 

into pUC18 digested with EcoRI and HindllL For gene fragments D, E, H and I, eight 
oligonucleotides were annealed, ligated, digested with EcoRI and Hindlll and cloned into pUC18 
digested with EcoRI and HindllL This procedure generated fourteen different plasmids™ 
pAMl A through pAMlN. 
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Table 2 


Fragment 

5* end 

3' end 

Note 

A 

Nhel 

1 

Apal 

279 


B 

Apal 

279 

Pmll 

544 


C 

Pmll 

544 

Pmll 

829 


D 

P - 

Pmll 
(BglTT/)Bam 
HI 

829 

1T72~ 

BglII(/BamHI) 

Bgin 

1172 

BamHI site 3' to seq 

1583 


F 

Bgin 

1583 

Kpnl 

1817 


G 

Kpnl 

1817 

BamHI 

2126 


H 

BamHI 

2126 

Pmll 

2491 


I 

Pmll 

2491 

Kpnl 

3170 

ABstEH 2661-2955 

J 

BstEII 

2661 

BstEII 

2955 


K 

Kpnl 

3170 

Apal 

3482 

L 

ApaT 

3482 

SmaI(/EcoRV) 

3772 


M 

(SmaI/)EcoR 
V 

3772 

BstEII 

4062 


N 

BstEH 

4062 

Smal 

4348 



In Table 2 the restriction site positions are numbered by the first base of the palindrome; 
numbering begins at the Nhel site. 


Table 3 


Oligo' 
Name 

Oligo' 
Lengt 
h 

_ — — i 

Oligonucleotide Sequence 

AMIAf 
1 

118 

GTAGAATTCGTAGGCTAGCATCK3AGATCGAGCTG AGC ACCTGC'I'l'C' ITCC 
TGTGCCTGCTGCGCTTCTGCrrCAGCGCCACCCGCCGCTACTACCTGGGCGC 

CGTGGAGCTGAGCTGG (SEQ ID NO:7) 

AMIAf 
2 

104 

GACTACATGCAGAGCGACCTGGGCGAGCrGCCCGTGGACGCCCGCTTCCC 
rrrrrrTGnTGCCCAAGAGCTTCCCCTrCAACACCAGCGTGGTGTACAAGAA 
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GAC (SEQ ID NO: 8) 

AMIAf 
3 

88 

CCTGTTCGTGGAGTTCACCGACCACCTGTTCAACATCGCCAAGCCCCGCC 
CCCCCTGGATGGGCCTGCTGGGCCCCTACAAGCTTTAC (SEQ ID NO: 9) 

AMIAr 
1 

119 

GTAAAGCTTGTAGGGGCCCAGCAGGCC(^TC 
GCGATGTTGAACAGGTGGTCGGTGAACTCCACGAACAGGGTCTTCTTGTAC 

ACCACGCTGGTGTTGAAGG (SEQ ID NO: 10) 

AMIAr 
2 

107 

GGAAGCTCTTGGGCACGCXjGGGGGGGAAGCGKjGCGTCCACGGGCAGCTC 
GCCCAGGTCGCTCTGCATOTAGTCCCAGCTCAGCTCCACGGCGCCCAGGTA 

GTAGCGG (SEQ ID NO: 11) 

AMIAr 

3 

84 

CGGGTGGCGCTGAAGCAGAAGCGCAGCAGGCACAGGAAGAAGCAGGTG 
CTCAGCTCGATCTGCATGCTAGCCTACGAATTCTAC (SEQ ID NO: 12) 

AMIBf 
1 

115 

GTAGAATTCGTAGGGGCCCCACCATCCAGGCCGAGGTGTACGACACCGT 
GGTGATCACCCTGAAGAACATGGCCAGCCACCCCGTGAGCCTGCACGCCGT 
GGGCGTGAGCTACTG (SEQ ID NO: 13) 

AMIBf 

2 

103 

GAAGGCCAGCGAGGGCGCCGAGTACGACGACCAGACCAGCCAGCGCGA 
GAAGGAGGACGACAAGGTGTTCCCCGGCGGCAGCCACACCTACGTGTGGC 
AGGTG (SEQ ID NO: 14) 

AMIBf 

3 

79 

CTGAAGGAGAACGGCCCCATGGCCAGCGACCCCCTGTGCCTGACCTACA 
GCTACCTGAGCCACGTXjCTACAAGCTTTAC (SEQ ID NO: 1 5) 

AMIBr 
1 

107 

GTAAAGCTTGTAGCACGTGGCTCAGGTAGCTGTAGGTCAGGCACAGGGG 
GTCGCTCGCCATGGGGCCGTTCTCCTTCAGCACCTGCCACACGTAGGTGTG 
GCTGCCG(SEQ ID NO: 16) 

AMIBr 

2 

101 

CCGGGGAACACCTTGTCGTCCTCCTTCTCGCGCTGGCT 
TACTCGGCGCCCTCGCTGGCCrrCCAGTAGCTCACGCCCACGGCGTGCAG 

(SEQ ID NO: 17) 

AMIBr 

3 

89 

GCTCACGGGGTCKjCTGGCCATGTTCTTCACKjGTGATCACCACGGTGTCGT 
ACACCTCGGCCTGGATGGTGGGGCCCCTACGAATTCTAC (SEQ ID NO: 18) 

AMICf 
1 

122 

GTAGAATTCGTAGCCACGTGGACCTGGTGAAGGACCTGAACAGCGGCCT 
GATCGGCGCCCTGCIX3GTGTGCCGCGAGGGCAGCCTGGCCAAGGAGAAGA 
CCCAGACCCTGCACAAGTTCATC (SEQ H) NO: 19) 

AMICf 
2 

110 

CTGCTGTTCGCCGTGTTCGACGAGGGCAAGAGCTGGCACAGCGAGACCA 
AGAACAG CCTGATGC AGG ACCGCGACGCCGCC AGCGCCCGCGCCTGGCC C 
AAGATGCACAC (SEQ ID NO: 20) 

AMICf 

3 

86 

CG1X5AACGGCTACGTGAACCGCAGCCTGCCCGGCCTGATCGGCTGCCACC 
GCAAGAGCGTGTACTGGCACGTGCTACAAGCTTTAC (SEQ ID NO: 21) 

AMICr 
1 

108 

GTAAAGCTTGTAGCACGTGCCAGTACACGCTCTTGCGGTGGCAGCCGATC 
AGGCCGGGCAGGCTGCGGTTCACGTAGCCGTTCACGGTGTGCATCTTGGGC 
CAGGCGC (SEQ ID NO: 22) 

AMICr 

2 

110 

GGGCGCTGGCGGCGTCGCGGTCCTGCATCAGGCTGTTCTTGGTCTCGCTG 
TG CCAGCTCTTGCCCTCGTCG AAC ACGG CG AACAGC AGGATGA ACTTGTGC 
AGGGTCTGG (SEQ ID NO: 23) 

AMICr 

3 

1 00 

GTCTTCTCCTrGGCCAGGCTGCCCTCGCGGCACACCAGCAGGGCGCCGAT 
CAGGCCGCTGTTCAGGTCCTTCACCAGGTCCACGTGGCTACGAATTCTAC 
(SEQ ID NO: 24) 

AMIDf 

99 

GTAGAATTCGTAGCACGTGATCGGCATGGGCACCACCCCCGAGGTGCAC 
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1 


AGCATCTTCCTGGAGGGCCACACCTTCCTGGTGCGCAACCACCGCCAGGC 
(SEQTDNO: 25) 

AMIDf 
2 

100 

CAGCCTGGAGATCAGCCCCATCACCTTCCTGACCGCCCAGACCCTGCTGA 
TGGACCTGGGCCAGTTCCTGCTGTTCTGCCACATCAGCAGCCACCAGCAC 
(SEQ ID NO: 26) 

AMIDf 

3 

101 

GACGGCATGGAGGCCTACGTGAAGGTGGACAGCTGCCCCGAGGAGCCCC 
AGClGCGCArGAAGAACAACGAGGAGGCCGAGGACTACGACGACGACCTG 
AC (SEQ ID NO: 27) 

AMIDf 
4 

84 

CGACAGCGAGATGGACGTGGTGCGCTTCGACGACGACAACAGCCCCAGC 
TTCATCCAGATCTCTACGGATCCTACAAGCTTTAC (SEQ ID NO: 28) 

AMIDr 
1 

109 

GTAAAGCITGTAGGATCCGTAGAGATCTGGATGAAGCTGGGGCTGTTGTC 
GTCGTCGAAGCGCACCACGTCCATCTCGCTGTCGGTCAGGTCGTCGTCGTA 
GTCCTCGG (SEQ ID NO: 29) 

AMIDr 
2 

101 

CCTCCTCGTTGTTCTTCATGCGCAGCTGGGGCTCCTCGGGGCAGCTGTCCA 
CCTTCACGTAGGCCTCCATGCCGTCGTGCTGGTGGCTGCTGATGTGGCAG 
(SEQ ID NO: 30) 

AMIDr 

3 

102 

AACAG LAGG AACTGG UCUAGG 1 GC ATG AGCAUGG 1 (J 1 UUUUUU1LAUUA 
AGGTGATGGGGCrGATCTCCAGGCTGGCCTGGCGGTGGTTGCGCACCAGG 
AAG (SEQ ID NO: 31) 

AMIDr 
4 

72 

GTGTGGCCCTCCAGGAAGATGCTGTGCACCTCGGGGGTGGTGCCCATGCC 
GATCACGTGCTACGAATTCTAC (SEQ ID NO: 32) 

AMIEf 
1 

122 

GTAGAATTCGTAGGGATCCGCAGCGTGGCCAAGAAGCACCCCAAGACCT 
GGGTGCACTACATCGCCGCCGAGGAGGAGGACTGGGACTACGCCCCCCTG 
GTGCTGGCCCCCGACGACCGCAG (SEQ ID NO: 33) 

AMIEf 
2 

120 

CTACAAGAGCCAGTACCTGAACAACGGCCCCCAGCGCATCGGCCGCAAG 
TACAAGAAGGTGCGCITCATGGCCTACACCGACGAGACC1TCAAGACCCGC 
GAGGCCATCCAGCACGAGAG (SEQ ID NO: 34) 

AMIEf 

3 

115 

CGGCATCCTGGGCCCCCTGCTGTACGGCGAGGTGGGCGACACCCTGCTGA 
TCATCTTCAAGAACCAGGCCAGCCGCCCCTACAACATCTACCCCCACGGCA 
TCACCGACGTGCGC (SEQ ID NO: 35) 

AMIEf 
4 

86 

CCCCTGTACAGCCGCCGCCTGCCCAAGGGCGTGAAGCACCTGAAGGACTT 
CCCCATCCTGCCCGGCGAGATCTCTACAAGCTTTAC (SEQ ID NO: 36) 

AMIEr 
1 

109 

GTAAAGCTTGTAGAGATCTCGCCGGGCAGGATGGGGAAGTCCTTCAGGT 
GCTTCACGCCCTTGGGCAGGCGGCGGCTGTACAGGGGGCGCACGTCGGTG 
ATGCCGTGGG (SEQ ID NO: 37) 

AMIEr 
2 

114 

GGTAGATGTTGTAGGGGCGGCTGGCCTGGTTCTTGAAGATGATCAGCACKj 
GTGTCGCCCACCTCGCCGTACAGCAGGGGGCCCAGGATGCCGCTCTCGTGC 
TGGATGGCCTCGC (SEQ ID NO: 38) 

AMIHr 

3 

121 

GGGTCTTGAAGGTCTCGTCGGTGTAGGCCATGAAGCGCACCTTCiTGTAC 
TTGCGGCCGATGCGCTGGGGGCCGTTGTTCAGGTACTGGCTCTTGTAGCTG 
CGGTCGTCGGGGGCCAGCAC (SEQ ID NO: 39) 

AMIEr 

4 

99 

CAGGGGGGCGTAGTCCCAGTCCTCCTCCTCGGCGGCGATGTAGTGCACCC 
AGGTCTTGGGGTGCTTCTTGGCCACGCTGCGGATCCCTACGAATTCTAC 
(SEQ ID NO: 40) 

AMIFf 

102 

GTAGAATTCGTAGAGATCTTCAAGTACAAGTGGACCGTGACCGTGGAGG 
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1 


ACGGCCCCACCAAGAGCGACCCCCGCTGCCTGACCCGCTACTACAGCAGCT 
TC (SEQ ID NO: 41) 

AMIFf 
2 

103 

GTGAACATGGAGCGCGACCTGGCCAGCGGCCTGATCGGCCCCCTGCTGAT 
CTGCTACAAGGAGAGCGTGGACCAGCGCGGCAACCAGATCATGAGCGACA 
AGC (SEQ ID NO: 42) 

AMIFf 
3 

61 

GCAACGTGATCCTGTTCAGCGTGTTCGACGAGAACCGCAGCTGGTACCCT 
ACAAGCTTTAC (SEQ ID NO: 43) 

AMIFr 
1 

87 

GTAAAGCTTGTAGGGTACCAGCTGCGGTTCTCGTCGAACACGCTGAACAG 
GATCACGTTGCGCTTGTCGCTCATGATCTGGTTGCCG (SEQ ID NO: 44) 

AMIFr 
2 

101 

CGCTGGTCCACGCTCTCCTTGTAGCAGATCAGCAGGGGGCCGATCAGGCC 
GCTGGCCAGGTCGCGCTCCATGTTCACGAAGCTGCTGTAGTAGCGGGTCAG 
(SEQ ID NO: 45) 

AMIFr 

3 

78 

GCAGCGGGGGTCGCTCTTGGTGGGGCCGTCCTCCACGGTCACGGTCCACT 
TGTACITGAAGATCTCTACGAATTCTAC (SEQ ID NO: 46) 

AMIGf 
1 

120 

GTAGAATTCGTAGGGTACCTGACCGAGAACATCCAGCGCTTCCTGCCCAA 
CCCCGCCGGCGTGCAGCTGGAGGACCCCGAGTTCCAGGCCAGCAACATCA 
TGCACAGCATCAACGGCTAC (SEO ID NO: 47) 

AMIGf 
2 

126 

GTGTTCGACAGCCTGCAGCTGAGCGTGTGCCTGCACGAGGTGGCCTACTG 
GTACATCCTGAGCATCGGCGCCCAGACCGACTTCCTGAGCGTGTTCTTCAG 
CGGCTACACCTTCAAGCACAAGATG (SEQ ID NO: 48) 

AMIGf 
3 

95 

GTGTACGAGGACACCCTGACCCTGTTCCCCTTCAGCGGCGAGACCGTGTT 
CATGAGCATGGAGAACCCCGGCCTGTGGATCCCTACAAGCTTTAC (SEQ ID 
NO: 49) 

AMIGr 
1 

119 

GTAAAGCTTGTAGGGATCCACAGGCCGGGGTTCTCCATGCTCATGAACAC 
GGTCTCGCCGCTGAAGGGGAACAGGGTCAGGGTGTCCTCGTACACCATCTT 
GTGCTTGAAGGTGTAGCC (SEQ ID NO: 50) 

AMIGr 
2 

124 

GCTGAAGAACACGCTCAGGAAGTCGGTCTGGGCGCCGATGCTCAGGATG 
TACCAGTAGGCCACCTCGTGCAGGCACACGCTCAGCTGCAGGCTGTCGAAC 
ACGTAG CCGTTG ATG CTGTG CATG (SEQ ID NO: 51) 

AMIGr 
3 

98 

A r rG'Il 1 GC r rGGCCrGGAACTCGGGGTCCrCCAGCrGCACGCCGGCGGGG n 1 
GGGCAGGAAGCGCTGGATGTTCTCGGTCAGGTACCCTACGAATTCTAC 
(SEQ ID NO: 52) 

AMIHf 
1 

111 

GTAGAATTCGTAGGGATCCTGGGCTGCCACAACAGCGACTTCCGCAACCG 
CGGCATGACCGCCCTGCTGAAGGTGAGCAGCTGCGACAAGAACACCGGCG 
ACTACTACGAG (SEQ ID NO: 53) 

AMIHf 
2 

102 

GACAGCTACGAGGACATCAGCGCCTACCTGCTGAGCAAGAACAACGCCA 
TCGAGCCCCGCCTGGAGGAGATCACCCGCACCACCCTGCAGAGCGACCAG 
GAG (SEQ ID NO: 54) 

AMIHf 

3 

105 

GAGATCGACTACGACGACACCATCAGCGTGGAGATGAAGAAGGAGGACT 
TCGACATCTACGACGAGGACGAGAA(X:AGA(iCCXXXXiCA(KTrrc:CACTAACT 
AAGACC (SEQ ID NO: 55) 

AMIHf 
4 

79 

CGCCACTACTTCATCGCCGCCGTGGAGCGCCTGTGGGACTACGGCATGAG 
CAGCAGCCCCCACGTGCT ACAAGCTTTAC (SEQ ID NO: 56) 

AMIHr 
1 

101 

GTAAAGCm^GTAGCACGTGGGGGCTGCTGCTCATGCCGTAGTCCCACAGG 
CGCTCCACGGCGGCGATGAAGTAGTGGCGGGTCTTCTTCTGGAAGCTGCGG 
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(SEQ ID NO: 57) 

AMIHr 

105 

GGGCTCTGGTTCTCGTCCTCGTCGTAGATGTCGAAGTCCTCCTTCTTCATC 
TrrAPGrTGATGGTGTCGTCGTAGTCGATCTCCTCCTGGTCGCTCTGCAGGG 
TG (SEQ ID NO: 58) 

AMIHr 

rt 
J 

108 

GTCCGGGTGATCTCCTCCAGGCGGGGCT 
P A GGT ArM^rn(^GATGTCCrCGTAGCTGTCCTCGTAGTAGTCGCCGGTGTT 

CTTGTCG (SEQ ID NO: 59) 

AM.lrir 

4 

OJ 

CAGCTGCTCACCTTCAGCAGGGCGG^ 
CCCAGG ATCCCTACGAATrCl AC (SfcQ ID NO: 60) 

AMI If 

1 
1 

115 

GTAGAATTCGTAGCACGTGCTGCGCAACCGCGCCCAGAGCGGCAGCGTG 
rrrr apvttp a AG a AGGTGGTGTTrrAGGAGTTCACCGACGGCAGCTTCACC 
CAGCCCCTGTACCGC (SEQ ID NO: 61) 

AMI If 

I 

111 

GGCGAGCTGAACGAGCACCTGGGCCTGCTGCXjCCCCTACA 
a raTtmr; A nr; A r A A P A Tr ATGGTG ArrGTGCAGGAG r rTCGCCCTGTTCTTCA 

J\\JJ\J l VjO^\VJAjT/\v^-rt-rt.V^/lL 1 Vv^V 1 VJ VJ J. VJ^VV^V^VJ J. VvfiVJ vj^twvj x a v^vj v^*w v-- j a ^ ^ 

CCATCTTCGAC (SEQ ID NO: 62) 

AMI If 

5 

106 

GAGACCAAGAGCTGGTACTTCACCGAGAACATGG 
mrrr^rn pa A pa tpp a r; ATf^A GG A CCCC APCTTCAAGG AGAACTACCGU 1 

V^V^L/VvV> 1 VJVaA/Vvv/\ 1 VvV>/\vJ/\ 1 VJVJ/ VvJ VJ/VV^Vy V^VvO.V^Vv 1 X Vy/V-tVVJVJi^.VJXyz i-V^ 1 ^ivv/\jv 

TCCACG (SEQ ID NO: 63) 

AMIIf 

A 

4 

85 

CCATCAACGGCTACATCATGGACACCCTGCCCGGCCTGGTGATGGCCCAG 
r^APPAnPOPATPPHPTGOTAPrPTAPAAGCTTTAC (SEOIDNO* 64) 

AMIIi 
1 

115 

GTAAAGCTTGTAGGGTACCAGCGGATGCGCTGGTCCTGGGCCATCACCAG 
GCCGGGCAGGGTGTCCATGATGTAGCCGTTGATGGCGTGGAAGCGGTAGTT 
CTCCTTGAAGGTGG (SEQ ID NO: 65) 

AMIIr 
2 

99 

GGTCCTCCATCTGGATGTO 
TCGGTGAAGTACCAGCTCTTGGTCTCGTCGAAGATGGTGAAGAACAGGG 

(SEQ ID NO: 66) 

AMIIr 

3 

110 

cgaactcctgcacggtcaccatgatg1tgtcctccacctcggcgcggatg 
taggggcccagcaggcccaggtgctcgtc^ 
ctgggtgaag (seq ed no: 67) 

AMIIr 
4 

93 

CrGCCGTCGGTGAACrCCTGGAACACCACCT 
GCCGCTCTGGGCGCGGTTGCGCAGCACGTGCTACGAATTCTAC (SEQ ID 

NO: 68) 

AMUf 
1 

116 

GTAGAATTCX}TAGGGTGACCTrUUUC 

ck:ctgatcagctacgaggaggaccagcgccagc^ccx:cgagccc<^ (seq 

rDNO: 69) 

AMUf 
2 

120 

GTGAAGCCCAACGAGACCAAGACCTACTTCTGGAAGGTGCAGCACCACA 
TGGCCCCCACCAAGGACGAGTTCGACTGCAAGGCCTGGGCCTTA,^ 
ACGTGGACCTGGAGAAGGAC (SEQ ID NO: 70) 

AMUf 

3 

91 

GTGCACAGCGGCCTGATCGGCCCCCTGCTGGTGTGCCACACCAACACCCT 
GAACCCCGCCCACGGCCGCCAGGTGACCCTACAAGCTTTAC (SEQ ID NO: 
71) 

AMUr 
1 

113 

GrAAAGCrFGl^AGGGICACCTGGCGGCCGTGGGCGGGG'rrCAGGGl'GriG 
GTGTGGCACACCAGCAGGGGGCCGATCAGGCCGCTGTGCACGTCCTTCTCC 

AGGTCCACGTCG (SEQ ID NO: 72) 

AMUr 

121 

CTUAAGTAGGCCCAGGCC 
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2 

AMUr" 
3 

93 

GTGGTGCTGCACCnTCCAGAAGTAGGTCTTGGTCTCGTTGGGCTrCACGAA 
CtTTCTTGG GGGCTCGGCGC ™q ^ NO 7 ^ 

cciXTGCGCTGcmx:'ix:cTCorA(]rr(iAT(:AG(u:r(;(rr(nA(iAA(Knc;rA(; 

GGGCGGCTCjGCCTGGlTGCGGAAGGTCACCCTACGAAnvrAC (SEQ ID 

NO- 741 

AMIKf 
1 

120 

GTAGAATTCGTAGGGTACCTGCTGAGCATGGGCAGCAACGAGAACATCC 
ACAGCATCCACTTCAGCGGCCACGTGTTCACCGTGCGCAAGAAGGAGGAG 
TAP A AnATGGCrCTGTACAAC fSEO ID NO' 75) 

AMIKf 
2 

122 

CTGTACCCCGGCGTGTTCGAGACCGTGGAGATGCTGCCCAGCAAGGCCGG 
CATCTGGCGCGTGGAGTGCCTGATCGGCGAGCACCTGCACGCCGGCATGA 
an ArTTTnTTfrTrsttTfTrACAG CSEO ID NO' 761 

AMIKf 

3 

102 

CAACAAGTGCCAGACCCCCCTGGGCATGGCCAGCGGCCACATCCGCGAC 
TTCCAGATCACCGCCAGCGGCCAGTACGGCCAGTGGGCCCCTACAAGCTTT 

AMIKr 
1 

123 

GTAAAGCTTGTAGGGGCCCACTGGCCGTACTGGCCGCTGGCGGTGATCTG 
GAAGTCGCGGATGTGGCCGCTGGCCATGCCCAGGGGGGTCTGGCACTTGTT 

AMIKr 
I 

125 

CTCATGCCGGCGTGCAGGTGCTCGCCGATCAGGCACTCCACGCGCCAGAT 
cmrncirc r wft crnncr A oc ATCTCC ACGGTCTCG AACACGCCGGGGTACAG 
GTTGTACAGGGCCATCTTGTACTC (SEQ ID NO: 79) 

AMIKr 

J 

96 

CTCCTTCTTGCGCACGGTGAACACGTGGCCGCTGAAGTGGATGCTGTGGA 
rn n^rrrGTTnrTfirrr ATOrTCAGCAGGTACCCTACGAATTCTAC (SEQ ID 
NO: 80) 

AMILf 
1 

120 

GTAGAATTCGTAGGGGCCCCCAAGCTGGCCCGCCTGCACTACAGCGGCA 
ATPA a rnrrrnn a nr a rr a aggagcccttcagctggatcaaggtggac 
CTGCTGGCCCCCATGATCATC (SEQ ID NO: 81) 

AMILf 

9 

116 

CACGGCATCAAGACCCAGGGCGCCCGCCAGAAGTTCAGCAGCCTGTACA 
TrAGrrAGTTrATCATCATGTACAGCCTGGACGGCAAGAAGTGGCAGACCT 
ACCGCGGCAACAGCAC (SEQ ID NO: 82) 

AMILf 

J 

86 

CGGCA(XCTGATGGTGTTCTTCGGCAACGTGGACAGCAGCGGCATCAAGC 
APA AC ATfTTrAACCCCCCCGGGCTACAAGCTTTAC (SEQ ID NO: 83) 

AMILr 
1 

no 

GTAAAGCTTGTAGCCCGGGGGGGTTGAAGATGTTGTGCiTGATGCCGCTG 
CTGTCCACGTTGCCGAAGAACACCATCAGGGTGCCGGTGCTGiTGCCGCGG 
TAGGTCTGC (SEQ ID NO: 84) 

AMILr 
2 

113 

CACTTCTTGCCGTCCAGGCTGTACATGATGATGAACIX3GCTGATGTACAG 
GCTGCTGAACTTCTGGCGGGCGCCCTGGGTCTTGATGCCGTGGATGATCAT 
GGGGGCCAGCAG (SEQ ED NO: 85) 

AMILr 

3 

99 

GTCCACCTTGATCCAGCTGAAGGGCTCCTTGGTGCTCCAGGCGTTGATGC 

TGCCGCTGTAGTGCAGGCGGGCCAGCTTGGGGGCCCCTACGAATTCTAC 
(SEQ ID NO: 86) 

AM1M 
fl 

122 

GTAGAATTCGTAGGATATCATCGCCCGCTACATCCGCCTGCACCCCACCC 
ACTACAGCATCCGCAGCACCCTGCGCATGGAGCTGATGGGCTGCGACCTGA 
ACAGCTGCAGCATGCCCC'IGG (SEQ ID NO: 87) 

AM1M 

e 

112 

GCATGGAGAGCAAGGCCATCAGCGACGCCCAGATCACCGCCAGCAGCTA 
CTTCACCAACATGTTCGCCACCTGGAGCCCCAGCAAGGCCCGCCTGCACCT 
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GCAGGG CCGCAG (SEQIDNO:88) 

AM1M 

89 

C AACGCCTGGCGCCCCCAGG 1 GAACAACCCCAAGGAG 1 GGC1 GCAGG I G 
G A(^CCAGAAGACCATGAAGGTGACCCTACAAGCTTTAC (SEQ ID NO: 89) 

AM1M 
rl 

112 

GTAAAGCTTGTAGGGTCACCrrCArCiG'J'CriCTGGAAGTCCACCTGCAGC 
CACTCCTTGGGGTTGTTCACCTGGGGGCGCCAGGCGITGCTGCGGCCCrGC 

AGGTGCAGGCG (SEQ ID NO: 90) 

AM1M 
r2 

114 

GGCCTTGCTGGGGCTCCAGGTGGCGAACATGTTGGTGAAGTAGCTGGrGG 
CGGTGATCTGGGCGTCGCTGATGGCCrrGCrcrCCATGCCCAGGGGCATGC 

TGCAGCTGTTCAG (SEQ ID NO: 91) 

AM1M 
r3 

97 

GTCGCAGCCCATCAGCTCCATGCGCAGGGTGCTGCGGATGCTGTAGTGGG 
TGGGGTGCAG GCGG ATGTAGCGGGCGATG ATATCCTACGAATTCTAC (SEQ 

ll) INU. vZj - - 

AMINf 
1 

122 

GTAGAATTCGTAGGGTGACCGGCGTGACCACCCAGGGCGTGAAGAGCCl 
GCTGACCAGCATGTACGTGAAGGAGTTCCTGATCAGCAGCAGCCAGGACG 

AMINf 
2 

104 

C AG AACGG CAAGGTG AA GGTGTTCC AGGG C AAC C AGG AC AGCTTC ACCC 
CCGTGGTGAACAGCCTCiGACCCCCCCCTGCTGACCCGCTACCTGCGCAlCC 

ACCC (SEQ ID NO: 94) 

AMINf 
3 

92 

CCAGAGCTGGGTGCACCAGATCGCCCTGCGCATGGAGGTGCTGGGCTGC 
GAGGCCCAGGACCTGTACTAGCTGCCCGGGCTACAAGCnTAC (SEQ ID 

NO: 95) 

AMINr 
1 

118 

GTAAAGCTTGTAGCCCGGGCAGCTAGTACAGGTCCTGGGCCTCGCAGCCC 
AGCACCTCCATGCGCAGGGCGATCTGGTGCACCCAGCTCTGGGGGTGGATG 

CGCAGGTAGCGGGTCAG (SEQ ID NO: 96) 

AMINr 

2 

100 

"CAGGGGGGGGTCCAGGCIGTTCACCACGGGGGTGAAGCTGTCCTGGTTGC 
CCTGGAACACCTTCACCTTGCCGTTCTGGAAGAACAGGGTCCACTGGTGG 

(SEQ ID NO: 97) 

AMINr 

3 

100 

CCGTCCTGGCTGCrGCTGATCAGGAACTCCTTCACGTACATGCTGGTCAG 
CAGGCTCTTCACGCCCTGGGTGGTCACGCCGGTCACCCTACGAATTCTAC 

(SEQ ID NO: 98) 


As noted in Table 2 and shown in Figure 5, fragment D was constructed with a BamHI 
restriction site placed between the Bgin site and the Hindlll site at the 3' end of the fragment. 
Fragment I was constructed to carry the DNA from Pmll (2491) to BstEII (2661) followed 
immediately by the DNA from BstEII (2955) to Kpnl (3170), so that the insertion of the BstEII 
fragment from pAMJ into the BstEII site of pAMI in the correct orientation will generate the 
desired sequences from 2491 to 3170. Plasmid pAMlB was digested with Apal and Hindlll and 
the insert was purified by agarose gel electrophoresis and inserted into plasmid pAMLA digested 
with Apal and Hindlll, generating plasmid pAMlAB. Plasmid pAMID was digested with Pmll 
and Hindlll and the insert was purified by agarose gel electrophoresis and inserted into plasmid 
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pAMlAB digested with Pmll and Hindlll, generating plasmid pAMlABD. Plasmid pAMIC 
was digested with Pmll and the insert was purified by agarose gel electrophoresis and inserted 
into plasmid pAMlABD digested with Pmll, generating plasmid pAMlABCD, insert orientation 
was confirmed by the appearance of a diagnostic 1 1 lbp fragment when digested with Mscl. 
Plasmid pAMlF was digested with Bglll and Hindlll and the insert was purified by agarose gel 
electrophoresis and inserted into plasmid pAMIE digested with Bglll and Hindlll, generating 
plasmid pAMlEF. Plasmid pAMlG was digested with Kpnl and Hindlll and the insert was 
purified by agarose gel electrophoresis and inserted into plasmid pAMlEF digested with Kpnl 
and Hindlll, generating plasmid pAMlEFG. Plasmid pAMU was digested with BstEII and the 
insert was purified by agarose gel electrophoresis and inserted into plasmid pAMU digested with 
BstEII, generating plasmid pAMHJ; orientation was confirmed by the appearance of a diagnostic 
465bp fragment when digested with EcoRI and EagL Plasmid p AMI U was digested with Pmll 
and Hindlll and the insert was purified by agarose gel electrophoresis and inserted into plasmid 
pAMlH digested with Pmll and Hindlll, generating plasmid pAMlHIJ. Plasmid pAMlM was 
digested with EcoRI and BstEII and the insert was purified by agarose gel electrophoresis and 
inserted into plasmid pAMIN digested with EcoRI and BstEII, generating plasmid pAMlMN. 
Plasmid pAMIL was digested with EcoRI and Smal and the insert was purified by agarose gel 
electrophoresis and inserted into plasmid pAMIMN digested with EcoRI and EcoRV, generating 
plasmid pAMlLMN. Plasmid pAMlLMN was digested with Apal and Hindlll and the insert 
was purified by agarose gel electrophoresis and inserted into plasmid pAMlK digested with 
Apal and Hindlll, generating plasmid pAMlKLMN. Plasmid pAMlEFG was digested with 
BamHI and the insert was purified by agarose gel electrophoresis and inserted into plasmid 
pAMlABCD digested with BamHI and Bglll, generating plasmid pAMlABCDEFG; orientation 
was confirmed by the appearance of a diagnostic 552bp fragment when digested with Bglll and 
Hindlll. Plasmid pAMlKLMN was digested with Kpnl and Hindlll and the insert was purified 
by agarose gel electrophoresis and inserted into plasmid pAMlHIJ digested with Kpnl and 
Hindlll, generating plasmid pAM 1 HI JKLMN. Plasmid pAMlHIJKLMN was digested with 
BamHI and Hindlll and the insert was purified by agarose gel electrophoresis and inserted into 
plasmid pAMl ABCDEFG digested with BamHI and Hindlll, generating plasmid pAMl-1 . 
These cloning steps are depicted in Figure 6. Figure 7 shows the DNA sequence of the insert 

contained in pAMl-1 (SEQ ID NO:l). This insert can be cloned into any suitable expression 
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vector as an Nhel-Smal fragment to generate an expression construct. pXF8.61 (Fig. 4), 
pXF8.38 (Fig. 1 1) and pXF8.224 (Fig. 13) are examples of such a construct. 

Construction of pXF8. 1 86 

The "LE" version of the B-domain-deleted-FVTII optimized cDNA contained in pAMl-1 
was modified by replacing the Leu-Glu dipeptide (2284-2289) at the junction of the heavy and 
light chains with four Arginine residues, making a total of five consecutive Arginine residues 
(SEQ ID NO:2). This was achieved as follows. The six oligonucleotides shown in Table 4 were 
annealed, ligated, digested with EcoRI and Hindlll and cloned into pUC18 digested with EcoRI 
and Hindlll, generating the plasmid pAM8B. Figure 8 shows how these oligonucleotides anneal 
to form the requisite DNA sequence. pAM8B was digested with BamHI and BstXI and the 

ad fileotrnnhoresis and used to replace the BamHI(2126)- 
BstXI(2352) fragment of the "LE" version (See Figure 7). Figure 9 shows the sequence of the 
resulting cDNA (SEQ ID NO:2). This "5Arg M version of the B-domain-deleted-FVIII optimized 
cDNA can be cloned into any suitable expression vector as a Nhel-Smal fragment to generate an 
expression construct. pXF8.1 86 (Figure 3) is an example of such a construct. 
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Table 4 


OLIGO' 
JNAJVlii 

OLIGO' 

OLIGONUCLEOTIDE SEQUENCE 

AM8F1 

140 

GTAGAATTCGGATCCTGGGCTGCCACAACAGCGACTT 
rear A A rrGCGGC ATGACCGCCCTGCTGAAGGTGAGC 
AGCTGCGACAAGAACACCGGCGACTACTACGAGGAC 
AGCTACGAGGACATCAGCGCCTACCTGCTG (SEQ ID 
NO:99) 

AM8BF2 

57 

AGCAAGAACAACGCCATCGAGCCCCGCAGGCGCAGG 
CGCGAGATCACCCGCACCACC (SEQ ID NO: 100) 

AM8F4 

58 

CTGCAGAGCGACCAGGAGGAGATCGACTACGACGAC 
ACCATCAGCGTGGAAGCTTTAC (SEQ ID NO: 101) 

AMSRi 

f y 

^nn aaa r^r^rr^r^ a r>mr*Tr± A 'VCirVTCYTnCYVncVV A CWCd AT 

VJ J _TLf-W"WJ V-' JL 1 V^V^J-"LV^VJ J. VJ1 ft. A vjvj j. a ^ v w * v — 

CTCCTCCTGGTCGCTCTGCAGGGTGGTGCGGGTGATCT 
CGCG(SEQIDNO:102) 

AM8BR2 

57 

CCTGCGCCTGCGGGGCTCGATGGCGTTGTTCTTGCTCA 
GCAGGTAGGCGCTGATGTC (SEQ ID NO:103) 

AM8BR4 

119 

CTCGTAGCTGTCCTCGTAGTAGTCGCCGGTGTTCTTGT 
CGCAGCTGCTCACCTTCAGCAGGGCGGTCATGCCGCG 
GITGCGGAAGTCGCTGTTGTGGCAGCCCAGGATCCGA 
ATTCTAC (SEQ ID NO: 104) 


Construction of pXF8.36 

The construct for expression of human Factor Vffl, pXF8.36 (Fig. 10) is an 1 1.1 kilobase 
circular DNA plasmid which contains the following elements: A cytomegalovirus immediate 
early I gene (CMV) 5' flanking region comprised of a promoter sequence, a 5' untranslated 
sequence (5'UTS) and first intron sequence for initiation of transcription of the Factor VIII 
cDNA. The CMV region is next fused with a wild-type B domain-deleted Factor VIII cDNA 
sequence. The Factor VTTI cDNA sequence is fused, at the 3' end, with a 0.3 kb fragment of the 
human growth hormone Y untranslated sequence. A transcription termination signal and 3' 
untranslated sequence (3' UTS) of the human growth hormone gene is used to ensure processing 
of the message immediately following the stop codon. A selectable marker gene (the bacterial 
neomycin phosphotransferase (neo) gene) is inserted downstream of the Factor VIII cDNA to 
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allow selection for stably transfected mammalian cells using the neomycin analog G418. 
Expression of the nco gene is under the control of the simian virus 40 (SV40) early promoter. 
The pUC 19-based amplicon carrying the pBR322-derived-fi-lactamase (amp) and origin of 
replication (on) allows for the uptake, selection and propagation of the plasmid in E coli K-12 
strains. This region was derived from the plasmid pBSII SK+. 

Construction ofpXF8.38 

The construct for expression of human Factor VIII, pXF8.38 (Fig. 11) is an 11.1 kilobase 
circular DNA plasmid which contains the following elements: A cytomegalovirus immediate 
early I gene (CMV) 5' flanking region comprised of a promoter sequence, 5' untranslated 
sequence (5 'UTS) and first intron sequence for initiation of transcription of the Factor VIII 
cDNA. The CMV region is next fused with a synthetic, optimally configured B domain-deleted 
Factor VTH cDNA sequence. The Factor VTII cDNA sequence is fused, at the 3' end, with a 0.3 
kb fragment of the human growth hormone 3' untranslated sequence. A transcription 
termination signal and 3' untranslated sequence (3' UTS) of the human growth hormone gene is 
used to ensure processing of the message immediately following the stop codon. A selectable 
marker gene (the bacterial neomycin phosphotransferase (neo) gene) to allow selection for stably 
transfected mammalian cells using the neomycin analog G418 is inserted downstream of the 
Factor VIII cDNA. Expression of the neo gene is under the control of the simian virus 40 
(SV40) early promoter. The pUC 19-based amplicon carrying the pBR322-derived ^-lactamase 
(amp) and origin of replication (ori) allows for the uptake, selection and propagation of the 
plasmid in E coli K-12 strains. This region was derived from the plasmid pBSII SK+. 

PXF8.269 Construct 

The construct for expression of human Factor vm (Fig. 12), pXF8.269, is a 14.8 kilobase 
(kb) circular DNA plasmid which contains the following elements: A human collagen (I) a 2 
promoter which contains 0.17 kb of 5' untranslated sequence (5' UTS), Aldolase A gene 5' 
untranslated sequence (5 'UTS) and first intron sequence for initiation of transcription of the 
Factor VIII cDNA. The aldolase intron region is next fused with a synthetic, wild-type B 
domain-deleted Factor VIII cDNA sequence. A transcription termination signal and 3' 
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untranslated sequence (3'UTS) of the human growth hormone gene to ensure processing of the 
message immediately following the stop codon. A selectable marker gene (the bacterial 
neomycin phosphotransferase (neo) gene) to allow selection for stably transfected mammalian 
cells using the neomycin analog G41 8 is inserted downstream of the Factor VIII cDNA.. The 
expression of the neo gene is under the control of the SV40 promoter. The pUC 19-based 
amplicon carrying the pBR322-derived p-lactamase (amp) and origin of replication (ori) allows 
for the uptake, selection and propagation of the plasmid in E coli K-12 strains. This region was 
derived from the plasmid pBSII SK+. 


PXF8.224 Construct 

The construct for expression of human Factor VIII, pXF8.224 (Fig 13), is a 14.8 kilobase 

i — tvkt A _1 — „,u:^l, fX11rt«ri-tinr *»1t»m^nto- A Vmmon pnllncrRTl (X\ fi 9 

y&U) UliUUliU J-^lN.TV jJiaoillH-l Wllic/ll Lumunu uiv iwiiuTiu^ wxw^w.*..^. - - - — o — \~/ 

promoter which contains 0.17 kb of 5' untranslated sequence (5'UTS), aldolase A gene 5' 
untranslated sequence (5 'UTS) and first intron sequence for initiation of transcription of the 
Factor VIII cDNA. The aldolase intron region is next fused with a synthetic, optimally 
configured B domain-deleted Factor VIII cDNA sequence. A transcription termination signal 
and 3' untranslated sequence (3 'UTS) of the human growth hormone gene is used to ensure 
processing of the message immediately following the stop codon. A selectable marker gene (the 
bacterial neomycin phosphotransferase (neo) gene) to allow selection for stably transfected 
mammalian cells using the neomycin analog G418 is inserted downstream of the Factor VIII 
cDNA. The expression of the neo gene is under the control of the SV40 promoter. The pUC 19- 
based amplicon carrying the pBR322-derived-p4actamase (amp) and origin of replication (ori) 
allows for the uptake, selection and propagation of the plasmid in E coli K-12 strains. This 
region was derived from the plasmid pBSII SK+. 


Clotting Assay 

A clotting assay based on an activated partial thromboplastin time (aPTT) (Proctor, et ai., 
Am. 1 Clin. Path., 36:212-219, (1961)) was performed to analyze the biological activity of the 
BDD hFVIII molecules expressed by constructs in which BDD-FVIII coding region was 
optimized. 
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Biological activity as analyzed using the clotting Assay 

The results of the aPTT-based clotting assay are presented in Table 5, below. Specific 
activity of the hFVIII preparations is presented as aPTT units per milligram hFVIII protein as 
determined by ELISA. Both of the human fibroblast-derived BDD hFVIII molecules (5R and 
LE) have high specific activity when measured the aPTT clotting assay. These specific activities 
have been determined to be up to 2- to 3-fold higher than those determined for CHO cell-derived 
full-length FVHI (as shown in Table 5). An average of multiple determinations of specific 
activities for various partially purified preparations of 5R and LE BDD hFVIII also shows 
consistently higher values for the BDD hFVIII molecules (1 1,622 Units/mg for 5R BDD hFVIII, 
and 14,561 Units/mg for LE BDD hFVIII as compared to 7097 Units/mg for full-length CHO 
cell-derived FVffl). An increased rate and/or extent of thrombin activation has been observed for 
various BDD hFVIII molecules, possibly due to an effect of the B-domain to protect the heavy 
and light chains from thrombin cleavage and activation (Eaton et al., Biochemistry, 25:8343- 
8347, (1986), Meulien et al., Protein Engineering, 2:301-306, (1988)). 
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Table 5. Specific Activities of Various hFVIU Proteins 


hFVIII 
Product 

Concen- 
tration by 
ELISA 

(mg/mL) 

aPTT 
Activity 

(aPTT 
U/mL) 

Specific 
Activity 

(aPTT 
U/mg) 

5RBDD 
hFVIII 

0.050 

1306 

26,120 

LEBDD 
HFVIII 

0.124 

2908 

23,452 

Full-length 
(CHO- 
derived) 
FVIII 

0.158 

1454 

9202 


Assay for Human Factor VIII in Transfected Cell Culture Supernatants 

Samples of cell culture, supernatants having cells transfected with wild-type, or 
optimized human BDD-human Factor VIII were assayed for human Factor Vffl (hFVIII) content 
by using an enzyme-linked immunosorbent assay (ELISA). This assay is based on the use of 
two non-crossreacting monoclonal antibodies (mAb) in conjunction with samples consisting of 
cell culture media collected from the supernatants of transfected human fibroblast cells. 
Methods of transfection and identification of positively transfected cells are described in the U.S. 
Patent No. 5,641,670, which is incorporated herein by reference. 
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Table 6 


Plasmid 

Pro rooter i 5* 

Un trail ilited srqimce 

Factor VI 11 cDNA 
Compmlttoi 

Mean 

(KVUImUHO* 
Cdls/24tir.) 

Maximum (KVIII 
mU/10*Cdls/24 
br.) 

(Number 
of St ralis 

Fold 
Increase 

pXFSJo 

CMV IE1 

Wild Type 

567 

2557 

38 


pXF838 

CMV IE1 

Optimal Configuration 

5403 

17106 

24 

9.5X 

pXF8269 

Collagen Ui2 } Aldolase 
Intron 

Wild Type 

382 

1227 

IS 


pXF8224 

Collagen Iu2 / AlUula*; 
Intron 

Opt in si 1 Configuration 

2022 

11930 

218 

5.3X 


ELI SA units based on standard curves prepared from pooled normal plasma. 


II. Factor IX Constructs and Uses thereof 

Construction o f Synthetic Gene Encoding Clotting Factor IX 

The four gene fragments listed in Table 7 and shown in Figure 14 were made by 
automated oligonucleotide synthesis and cloned into plasmid pBS to generate four plasmids, 
pFIXA through pFIXD. 

Table 7 


Fragment 

5' end 

3 'end 


A 

BamHI 

1 

Stul(/Fspl) 

379 

B 

(StuI/)FspI 

379 

PflMI 

810 

C 

PflMI 

810 

PstI 

1115 

D 

PstI 

1115 

BamHI 

1500 


As shown in Figure 14, plasmids pFIXA through pFIXD were used to construct 
pFKABCD, which carries the complete synthetic gene. Fragment A was synthesized with a PstI 
site 3' to the StuI site, and was cloned as a BamHT - PstI fragment. Plasmid pFIXD was digested 
with PstI and Hindlll, and the insert was purified by agarose gel electrophoresis and inserted into 
plasmid pFEXA digested with PstI and HindLII, generating plasmid pFIXAD. Plasmid pFIXB 
was digested with EcoRI and PflMI and the insert was purified by agarose gel electroporesis and 
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inserted into plasmid pFDCC digested with EcoRI and PflMI, generating plasmid pFIXBC. 
Plasmid pFIXBC was digested with Fspl and Pstl and the insert was purified by agarose gel 
electrophoresis and inserted into plasmid PFIXAD digested with StuI and Pstl, generating 
plasmid PFIXABCD. 

Figure 15 shows the DNA sequence of the BamHI insert contained in pFIXABCD. This 
insert can be cloned into any suitable expression vector as a BamHI fragment to generate an 
expression construct. This example illustrates how a fusion site can be used in the construction 
even when there exists an identical sequence in close proximity (Fragments A, B and D all 
contain the hexamer "AGGGCA", the product of blunt end ligation of Stul-Fspl digested DNA). 
This is possible because the resulting fusion sites are not cut by the restriction enzymes used to 
create them. This example also illustrates how the gene fragments can by synthesized with 
additional restriction sites outside of the actual gene sequence, and these sites can be used to 
facilitate intermediate cloning steps. 

Expression of Human Factor EX from Optimized and Non-optimized cDNA 

The construct for the expression of human Factor IX (Figure 16), pXBC76, is a 8.4 
kilobase (kb) circular DNA plasmid which contains the following elements: a cytomegalovirus 
(CMV) immediate early I gene 5' flanking region comprising a promoter sequence, 5' 
untranslated sequence (5'UTS) and a first intron sequence (equivalent to nucleotides 174328 - 
172767 of Genbank Accession X17403). The CMV region is next fused with a wild-type Factor 
IX cDNA sequence, with a BamHI site at the junction. The Factor IX cDNA sequence is next 
fused to a 1 .5 kb fragment from the 3' region of the Factor IX gene that includes the transcription 
.termination signal (equivalent to nucleotides 34335 - 35857 of Genbank Accession K02402). A 
selectable marker gene (the bacterial neomycin phosphotransferase gene (neo)) to allow selection 
for stably transfected mammalian cells using the neomycin analog G418 is inserted upstream of 
the CMV sequences. Expression of the neo gene is under the control of the herpes simplex virus 
thymidine kinase promoter. The neo expression cassette is equivalent to nucleotides 452-1596 of 
Genbank Accession U43612. The pUC19 - based amplicon carrying the pBR322-derived beta- 
lactamase gene and origin of replication allows for the selection and propagation of the plasmid 
in E. coli. 
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Plasmid pXDC170 containing a Factor IX coding region with an optimized configuration 
can be derived from pXDC76 by digestion with BamHI and Bell and insertion of the BamHl 
fragment shown in Figure 15, thus producing an equivalent construct that directs the expression 
of human Factor IX from an optimized cDNA. 

Samples of cell culture supernatants from normal human foreskin fibroblast clones 
transfected with either wild-type or optimized expression constructs were assayed for expression 
of Factor IX. As seen in Table 8, a 2.7-fold increase in mean expression of Factor IX could be 
demonstrated when optimized cDNA was substituted for the wild-type sequence. 


Table 8: Expression data for strains expressing Factor IX 


Plasmid 

Promoter/5' 
untranslated 
sequence 

cDNA composition 

Mean 

Maximum 

Number 
of Cell 
Strains 




Nanograms/ 

10° cells/24hr 


pXK76 

CMV 

Wild Type 

418 

8384 

144 

pxixno 

CMV 

Optimal Configuration 

1127 

3316 

33 


III. Alpha-Galactosidase Constructs and Uses thereof 
Construction of a Synthetic Gene Encoding q-Galactosidase 

The four gene fragments listed in Table 9 were made by automated oligonucleotide 
synthesis and cloned into the vector pUC18 as EcoRI - Hind III fragments (with the N-terminus 
of each gene fragment adjacent to the EcoRI site) to generate four plasmids, pAM2A through 
pAM2D. 


Table 9 


Fragment 

5'end 

A 

BamHI 

1 

PstI 

364 

B 

PstI 

364 

Bgin(/BamHI) 

697 

C 

(BglII/)BamHl 

697 

Smal(/Stul) 

1012 

D 

(SmaI/)StuI 

1012 

Xhol 

1347 


Plasmids pAM2A through pAM2D were used to construct pAM2ABCD, which carries 
the complete synthetic gene. Plasmid pAM2B was digested with PstI and Hindlll and the insert 
was purified by agarose gel electrophoresis and inserted into plasmid pAM2A digested with PstI 
and Hindlll, generating plasmid pAM2AB. Plasmid pAM2D was digested with StuI and Hindlll 
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and the insert was purified by agarose gel electrophoresis and inserted into plasmid pAM2C 
digested with Smal and Hindlll, generating plasmid pAM2CD. Plasmid pAM2CD was digested 
with BamHI and Hindll and the insert was purified by agarose gel electrophoresis and inserted 
into plasmid pAM2AB digested with Bglll and Hindlll, generating plasmid pAM2ABCD. 

Figure 17 shows the DNA sequence of the BamHI-XhoI fragment contained in 
pAM2ABCD. This insert can be cloned into any suitable expression vector as a BamHI - Xhol 
fragment to generate an expression construct. This example illustrates the use of fusion sites that 
arise from the ligation of two complementary overhangs (Bglll/BamHI) and from the ligation of 
blunt ends (Smal/StuI). 

Expression of Human q-Galactosidase from Optimized and Non-optimized cDNAs 

The construct for the expression of human a-galactosidase, plasmid pXAG94 (Figure 18) 
is a 8.5kb circular DNA plasmid which contains the following elements. A selectable marker 
gene (the bacterial neomycin phosphotransferase gene (neo)) is inserted upstream of the a- 
galactosidase expression cassette to allow selection for stably transfected mammalian cells using 
the neomycin analog G418. Expression of the neo gene is under the control of the SV40 early 
promoter. Specifically, the 342 bp PvuII - Hindlll fragment equivalent to nucleotides 273 - 
1/5243 - 5172 of Genbank Accession J02400 is fused via a Xhol linker to a fragment equivalent 
to nucleotides 502 - 561 of Genbank Accession J02400, which is next fused to the neo coding 
region, equivalent to nucleotides 350 - 1322 of Genbank Accession U13862. Poly-adenylation 
signals for this expression cassette are supplied by sequences 3393 - 3634 of SYNPRSVNEO. 
This selectable marker is fused to a short plasmid sequence, equivalent to nucleotides 2067 
(PvuII) - 2122 of SYNPBR322. 

Expression of the a-galactosidase cDNA is directed from a CMV enhancer (equivalent to 
nucleotides 174253 - 173848 of Genbank Accession X17403). This DNA is fused via the linker 
sequence TCGACAAGCCGAATTCCAGCACACTGGCGGCCGTTACTAGTGGATCCGAG 
(SEQ ID NO: 107) to human elongation factor la sequences extending from -207 to +982 
nucleotides relative to the cap site. These sequences provide the EF1 alpha promoter, CAP site 
and a 943 nucleotide intron present in the 5' untranslated sequences of this gene. The DNA is 
next fused to the linker sequence 

GAATTCTCTAGATCGAATTCCTGCAGCCCGGGGGATCCACC (SEQ ID NO: 108) 

68 
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followed immediately by 335 nucleotides of the human growth hormone gene, starting with the 
ATG initiator codon, equivalent to nucleotides 5225 - 5559 of Genbank Accession J03071 . This 
DNA codes for the signal peptide of the hGH gene, including the first intron. 

This DNA is next fused to the portion of the wild-type a-galactosidase cDNA that codes 
for amino acids 3 1 to 429. The coding region is next fused via the linker 
AAAAAAAAAAAACTCGAGCTCTAG (SEQ ID NO: 109) to the 3* untranslated region of the 
hGH gene, corresponding to nucleotides 6699 - 7321 of Genbank Accession J03071. Finally, 
this DNA is fused to a pUC - based amplicon carrying the pBR322-derived beta-lactamase gene 
and origin of replication which allows for the selection and propagation of the plasmid in E. coli; 
the sequences are equivalent to nucleotides 229 - 1/2680 - 281 of SYNPUC12V. 

Plasmid pXAG95 is equivalent to pXAG94, with the a-galactosidase cDNA sequence 
rprtlared with the oorrasnonding optimized configuration sequence (coding for amino acids 31 to 
429) from Figure 17. 

Plasmid pXAG73 (Figure 19) is a lOkb plasmid similar to pXAG94, but with the 
following differences. The linker sequence 

GCCGAATTCCAGCACACTGGCGGCCGTTACTAGTGGATCCGAG (SEQ ID NO: 1 10) and 
the adjacent EF1 alpha DNA as far as +30 beyond the cap site have been replaced with the 
mouse metallothionein promoter and cap site (nucleotides -1752 to +54 relative to the mMTI cap 
site). Also the attachment of the EFIa UTS to the hGH coding sequence differs: EFla sequences 
extend as far as +973 from the EFla cap site, followed by the linker CTAGGATCCACC (SEQ 
ID NO: 111), in place of the 

GAATTCTCTAGATCGAATTCCTGCAGCCCGGGGGATCCACC (SEQ ID NO: 108) linker 
described above. 

Plasmid pXAG74 is equivalent to pXAG73, with the wild-type a-galactosidase cDNA 
sequence replaced with the corresponding optimized configuration sequence (coding for amino 
acids 31 to 429) from Figure 17. 

'Die construction of such plasmids, including the creation of hGH -a-galactosidase 
fusions, is described in the U.S. Patent 6,083,725, which is incorporated herein by reference. 


69 


WO 02/06479 4 ) 


PCT/U S0 1/42655 


Samples of cell culture supernatants from normal human foreskin fibroblast clones 
transfected with either wild-type or optimized expression constructs were assayed for expression 
of a-galactosidase. 


Table 10: Expression data for strains expressing alpha-galactosidase 


Plasmid 

Promoter/5' 
untranslated 
sequence 

cDNA composition 

Mean 

Maximum 

Number 
of Cell 
Strains 




Units/1 0°cells/24hr 


pXAG-73 

CMV/mMT/EF 1 a 

Wild Type 

323 

752 

12 

pXAG-74 

CMV/mMT/EF 1 a 

Optimal Configuration 

1845 

8586 

27 

pXAG-94 

CMV/EFla 

Wild Type 

417 

1758 

39 

pXAG-95 

CMV/EFla 

Optimal Configuration 

842 

3751 

75 


As shown in Table 10, 5.7- and 2.0-fold increases in mean a-galactosidase expression were 
seen when optimized cDNA was expressed from the EFla (PXAG-95) and mMTl (PXAG-74) 
promoters, respectively, when compared to wild type coding sequences. Furthermore, 
significant increases in maximum expression were also seen when the optimized cDNA was 
expressed from either promoter. 


All patents and other references cited herein are hereby incorporated by reference. 
Equivalents 

Those skilled in the art will recognize, or be able to ascertain using no more than routine 
experimentation, many equivalents to the specific embodiments of the invention described 
herein. Such equivalents are intended to be encompassed by the following claims. 

What is claimed is: 
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What is claimed: 

1 . A synthetic nucleic acid sequence which encodes a protein wherein at least one 
non-common codon or less-common codon has been replaced by a common codon, and 
having one or more of the following properties: 

(i) the synthetic nucleic acid sequence comprises a continuous stretch of at 
least 150 codons all of which are common codons; 

(ii) the synthetic nucleic acid sequence comprises a continuous stretch of 
common codons, which continuous stretch includes at least 60% or more of the codons in 
the synthetic nucleic acid sequence; or 

(iii) wherein at least 98% or more of the codons in the sequence encoding 
the protein are common codons and wherein the synthetic nucleic acid sequence encodes 
a protein of at least about 90 amino acids in length. 

2. The synthetic nucleic acid sequence of claim 1 , wherein all of the non-common 
and less-common codons of the synthetic nucleic acid sequence encoding a protein have 
been replaced with common codons. 

3. The synthetic nucleic acid of claim 1, wherein the number of non-common or 
less-common codons replaced or remaining is between one and 15. 

4. The synthetic nucleic acid of claim 1, wherein the synthetic nucleic acid encodes 
Factor VIII. 

5. The synthetic nucleic acid sequence of claim 1 wherein the synthetic nucleotide 
encodes a Factor VIII having one or more of the following characteristics: 

a) the B domain is deleted (beta domain deleted (BDD) factor VIII); 

b) it has a recognition site for an intracellular protease of the FACE/ftirin class; 

or 

c) it is expressed in a non-transformed cell. 
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6. The synthetic nucleic acid sequence of claim 4, wherein all non- common and 
less-common codons are replaced with common codons. 

7. The synthetic nucleic acid of claim 1 , wherein the synthetic nucleic acid encodes 
Factor IX. 

8. The synthetic nucleic acid sequence of claim 1, wherein the synthetic nucleic acid 
encodes a factor IX polypeptide having one or more of the following characteristics: 

a) it has a PACE/furin site at a pro-peptide mature protein junction; or 

b) is expressed in a non-transformed cell. 

9. The synthetic nucleic acid sequence of claim 7, wherein all non- common and 
less-common codons are replaced with common codons. 

10. A vector comprising the synthetic nucleic acid sequence of claim 1, 2, 3 or 4. 

11. A cell comprising the nucleic acid sequence of claim 10. 

12. A synthetic nucleic acid sequence which encodes alpha-galactosidase, wherein at 
least one non-common codon or less-common codon has been replaced by a common 
codon and wherein the synthetic nucleic acid has one or more of the following properties: 

(a) it has a continuous stretch of at least 90 codons all of which are common 
codons; 

(b) it has a continuous stretch of common codons which comprise at least 33% of 
the codons of the synthetic nucleic acid sequence; 

(c) at least 90% or more of the codons in the sequence encoding the protein are 
common codons and the synthetic nucleic acid sequence encodes a protein of at least 
about 90 amino acids in length; 

(d) it is at least 80 base pairs in length. 
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13. The synthetic nucleic acid sequence of claim 12, where the alpha-galactosidase 
nucleic acid is inserted into a non-transformed cell. 

14. The synthetic nucleic acid sequence of claim 12, wherein the number of non- 
common or less- common codons remaining is less than 15. 

15. The synthetic nucleic acid sequence of claim 12, wherein all non- common or 
less-common codons are replaced with common codons. 

16. A vector comprising the synthetic nucleic acid sequence of claim 12, 14 or 15. 

17. A cell comprising the nucleic acid sequence of claim 1 6. 

18. A method of producing alpha-galactosidase comprising culturing the cell of 
claim 17 under conditions in which the nucleic acid is expressed. 

19. A method for preparing a synthetic nucleic acid sequence encoding alpha- 
galactosidase which is at least 90 codons in length, comprising: 

(a) identifying a non-common codon and a less-common codon in a non- 
optimized gene sequence which encodes an alpha-galactosidase protein; and 

(b) replacing at least 90% of the non-common and less-common codons with a 
common codon encoding the same amino acid as the replaced codon. 

20. The method of claim 19, wherein at least 94% of the non-common and less- 
common codons are replaced with a common codon encoding the same amino acid as the 
replaced codon. 

21. A method of providing a subject with alpha-galactosidase, comprising: 

(a) providing a synthetic nucleic acid sequence that can direct the synthesis of an 
optimized message for alpha-galactosidase; 

(b) introducing the synthetic nucleic acid sequence into the subject; and 
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(c) allowing the subject to express the alpha-galactosidase, thereby providing the 
subject with the alpha-galactosidase. 

22. The method of claim 21, wherein the synthetic nucleic acid is introduced into a 
cell. 

23. The method of claim 22, wherein the cell can be an autologous, allogeneic, or 
xenogeneic cell. 

24. The method of claim 21, wherein at least 98%, or all of the codons in the 
synthetic nucleic acid sequence are common codons. 

25. The method of claim 21, wherein the subject has a disorder characterized by an 
alpha-galactosidase deficiency. 

26. A method for preparing a synthetic nucleic acid sequence which is at least 90 
codons in length, comprising: 

identifying a non-common codon and a less-common codon in a non-optimized 
gene sequence which encodes a protein and is at least 90 codons in length; and 

replacing at least 98% of the non-common and less-common codons with a 
common codon encoding the same amino acid residue as the replaced codon. 
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Ecofll Nbel 

1 7AGAAT7C G x «GGCTAGCA73CAGATCGAGC7CAGCACC7CC~CTTCCTCTGCC^C-C**'-rr,. — ..^ 

l^Hatu:aIiaGluL«uSerW-rCysfhePh«LeuCy S L«uL*uAr5PhsCysp!is 

"2 AC^GCCACCCGCCGx-_ACeACe7C<KK:GCCG7GGAGe7GAGC^^^ 

^►s-rAXaThrArgArcTyrTy-euGXyAlaVaXClu^^^ 
145 CTGCCCGTGGACCCCCGCTTCCCCCCCCGCGTGCCC^ 

■i3^Leu?roValAspAlaArgPhs?rcProArgVal?rcLysS«r?SiePrePtieAsnTSirSerValValTyr' ys 
217 AAGACCCCGTTCGTGGAGTTCACCGACC^CCTGTTCAAtt^ 

ST^LysThrLeuPheValGiuPheThrAspHisLeirP^AsnllaAlaLysPrsArgProProTrpMetGly^u 
A P a< Msct 

289 CTGGGCCCCACCATCCAGGCCGAGGTGTACGACACCGTGGTGATCAXCCTGAAGAACATGGCCAGCCACCC^ 
91>LeuGlyProThrIIeGlnAlaGluValTyrAspThrValVali:eThrLeuLysAsnWecAlaS e rHi$Pro 
3 61 GTGAGCCTGGACGCCGTGGGCGTGAGCTACTGGAAGGCCAGCGAGGGCC^ 

HS^ValSerLeuiiisAlaValGlyValSarTyrTrpLysAlaSerGluGlyAlaGluTyrAspAsDGlnThrSer 
433 CAGCGCGAGAAGGAGGACGACAAGGTGTTCCCCGGC 

139^GlnArgGluLysGluAspAspLysValPhePrcGiyGlySerHisTbr?yrValTrpGlnValLeuLysGlu 
Mscl Pm(| 

305 AACGGCCCCATGGCCAGCGACCCCC?^^ 

153^ AsQGlyProM«cAXaS«rAsp?roLeuCysLeu-rhrTyrserTyrLauSer.4isValAspLeuVaXLysAsp 

Msct 

577 CTGAACAGCGGCCTGATCGGCGCCCTGCTGGTGTGCCGCGAGGGCAGCC^ 

187^LeuAsnScrGlyLeuIleGXyAlateuLeuValCysArgGluGlySerLeuAlaLy3GtuLysTrhrGlnTte 
649 CTGCACAAGTTCATCCTGCTGTTCGCCGTGTTCGACGAGGGCA^ 

2U^LeuHiaLysPheIleLeuLeuPheAlaValPheAspGluGlyLysSerrrpHisS«rGluThrLyaAsnS«r 
721 CTGATGCAGGACCGCGACGCCGCCAGCGCCCGCGCCTGGCCCAAGA 

235^LeuMetGlriAspArgAspAlaAlaScrAlaArgAlaTrpProLysMetHisThrValAanGXyTyrValAfin 

Pmlt 

793 CGCAGCCTGCCCGGCCTGATCGGCTGCCACCGCAAGAGCGTGTACT 

259^ArgSerLeuProGXyLeulXaGlyCysHisArgLysSerVaXTyrTrpHisValIleGXyMeCGXyThrThr 
3 65 CCCOAGGTGCACAGCATCTTCCTGGAGGGCC^CACCrrCC^ 

283>ProGXuValHisSerIXePheLeuGXuGXyHisThrPheL«uVaXArgAsru4isArgGXnAXaSerLeuGlu 
937 ATCAGCCCC^TC^CCTrCCTGA 

207> TieSerProIXeThrPheLauThrAXaGXnThrLeuLeuMetAspLeuGXyGXnPheLeuLeuPheCysHis 
1009 A7CAGCAGCCACCAGCACGACGGCATGGAGGCC7ACGTGA 

331^ IXaScrSerHisGXnHisAspGlyMecGXuAlaTyrVaXLysVaXAspSerCysProGXuGXuProGliiLeu 
1081 CGCATGAAGAACAACGAGGAGGCCGAGGACTAC GACGACGAC ctg ac cgac agcgagatgg acgtggtgcgc 
355^ ArgMetLysAanAsnGXuGXuAXaGXuAspTyrAspAapAspLeuThrAspserGXuMetAspValValArg 

(Bglil/BamHI) 

1153 TTCC^CC^CGACVUCAGCCCCAGCrc 

379> PtieAspAspAspAsnSerPrcSerPheXXeGXnlXeArgSerVaXAlaLysLysHisProLysThrTrpyal 
122 5 CACTACATCGCCGCCGAGGAGGAGGAC7C<^AC^ 
403^HisrVrXieAlaAXaGXuG:uGXuAs P TrpAsp^^^ 

EagJ 

1297 A^GAGCC^GTACCTGAACAJ^CGGCCCCCAGCGCATCGGCCGC 
427> LysSerGlnTyrLeuAsnAsnGXyPreGXr^Vrgl^^ 

Apal 

13 69 ACCGACGAGACCTTCAAGA.ee -^CGAGGCCATCCAGCACGAGAGCGGCATC CTGGGCC GG GTGCTGTACGGC 
451^T^rAscOluTnrPheLysr^rArgG;uAXai:eGiaHisG:uSerGlyIi-LeuGlyPrcLeuLeuTyrGiy 
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2441 2-A.GGTGG-Gn-vaAv.A.C ATC ATCCTC AAGAACCAGGCCAGCC 3CC CCTACAACATCTACCC CCACGGC 

-175 ► Gl^ValGlyAsp-r^rleuLeuIicIlePheLysAsnGlnAlaSerAr-? r cry r As nil »Tvr ?r cHisG lv 

1 5 1.3 AT CAC CGAC GTGC G C C C C C TGT ACAG C C GC CGCCTGCC CAAGGGC GTG AAGCAC CTGAAGGAC TTC C G CATC 
499 ► : l^ThrAspValAr gPrcL^uTyrSsrArcArgLeuPrcLysGlyValL/sHisLeuLysAspPhsPro lie 
Bgltl 

"535 C CGCCCGGCGAGAT 2 CTCAAGT AC AAG TGGAC C GT G AC C 3 TGG AGG AC GGC C C CAC CAAG AGC GAG G G G G GG 
523 ► LcuPrcGlyGluIl2?h«LysTyrLysTr?ThryalTh.rValGiuAspGlyProThrLysSerAspProAr? 

:557 tggctc-acggggta.ctacagcagcitcgtgaacatggagcgcgaccgc<k:cagggggctgatcc^gcgcctg 

547 ► CysLeuThrArgTyrTyrSerSerPheValAsnMecGluArgAspLeuAlaSerGlyLeuIleGlyProLeu 

1729 ctgatctgctacaaggagagcgtggaccagcgcc^caacca^ 

571* LeuIleCysTyrLysGluSerValAspGlnArgGlyAsnGlnlleMecSerAspLysArgAsnVallleLeu 

Kpnl 

1301 TTCAGCGTGTTCCACGAGAACCCCAGCTC^TACCTGACC^^ 

595*' pheSerValPheAspGluAsnArgSerTrpTyrLeuThrGluAsnlleGlnArgPheLeuProAsnProAla 
1873 GGCGTGC AGGTGGAGGACC C C GAGTTC C AGGGC AGGAACATCATGCAC AGC ATGAACGGGTACGTGTTCGAC 

619 ► GlyValGlnLeuGluAspPrcGiuPheGlnAlaSerAsnll^MetHisSer IleAsnGlyTyrValPheAsp 
^345 AGCCTGC AGCTGAGCGTGTGCCTCC AC GAGGTGGC CT ACTGGT AC ATC C TGAGC AT CGGCGC C CAGAC CG AC 

543* SerLeuGlnLeuSerValCysLeuHisGluValAlaTyrTrpTyrlisLeuSerlieGlyAlaGlnT^rAap 
2 017 CTCCTGAGCGTGTTCTTCAGCGGCTACACC^^ 

667* PheLeuServaiPr.ePneSerGlyTyrThrPheLysHisLysMetValTyrGluAapThrLeuThrLeuPlie 

BamHI 

2089 CCCTTGAGCGGCGAGACCGTGTTCATGAGCATGGAGAACCCCGGCCT 

691^ProPheSerGlyGluThrValPheMetSerMecGluAsnProGlyLeuTrpIlaL«uGlyCysHis As riser 
2161 GACTTCCGCAACCGCGGCA7GACCGCCCTGCTGAAGGTGAGCAGCTGC 

715> AspPheArgAsriArgGlyMecThrAlaLeuLeuLysValSerSerCysAspLysAsnThrGlyAspTyrTyr 
2233 GAGGACAGCTACGAGGACATCAGCGCCTACCTGCTGAGCAAGAACAACGCCATCGAG^ 

739^GluAspSerTyrGiuAspIIeSerAlaTyrLeuLeuSerLysAsnAsnAlaIleGluProArgLeuGluGlu 

BstXI 

2305 ATC AC C CG CACCAC C CTGCAG AGCGACC AGG AGGAGATCGACTA.C G ACGACACCATCAGCGTGGAGATGAAG 
~o3> I i^Thr ArgThrThrLeuGir.SerAspGlnGluGlul leAspTyr AspAspThr IIeS«rVaIGiut-!scLy s 

22 77 AAGGAGGACTTCG AC ATCTACGACG AGGACG AGAAC CAGAGCC C G CGC AGCT'UCCAGAAGAAGACCCGC CAC 
"37 ► ly 3 G 1 uAsp Ph e As p I leTyr AspG luAspGi u As nG I n S e r P r c Ar g £ e r P hsG Inly 3 LysT fcr Ar g H i s 

Pmll 

2449 TACTTCATCGCCGCCGTGGAGCGCCTGTGGGACTACGGCATGAGCAGCAGCCCCCXCGTGC 
311^TyrPheIleAlaAlaValGIuArgLeuTrpAspTyrGlyMecSerSerSerProHisValLeuArgAsnArg 

2521 GCCCAGAGCGGCAGCGTGCCCCAGTTCAAGAAGGTGGTGTTCCAGGA 
335 ► AlaGlnSerGlySerValProGlnPheLysLysValValPheGlnGluPheThrAspGlySerPheTiirGln 

Apal 

2593 CCCCTGTACCGCGGCGAGCTGAACGAGCACCTGGGCCTCXITGG 
359 ► pro^euTyrArgGlyGluLeuAsnGluKisLeuGlyLeuLeuGlyProTyrlleArgAldGluValGluAsp 
BstEII 

2565 .^CATCATGGTGACCTTCCGCAACCAGGCCAGCCGCCCCTACAGCTTCTACAGCAGCCTGATCAGCTA.CGAG 
c33* A5nIl2^ ecVai ^" rpheAr ? AsnGiriA - LaSerAr ? Pro ' r y r£er - ^sTyrSerSerLeulleSerTyrGlu 

""737 GAGGACGAGCGC CA.GGGCGC C GAGC C C C GC A-AG AAC TT CG TG AAG C C C AAC GAG AC CAAG AC CT AC TT CTGG 
207 ► GloAspGlnArgGi^GlyAlaGiuProArgLysAsnPheValLysPrcAsnGluThrl-ys rhrTyrPheTr? 

"309 - AGGTGCAGCAC C AC ATGGC C C CC ACC AAGGAC G AGTTCG ACTGCAAGGCCCGGGCCT AC J I" "C AGCG ACGCG 
?31> LvsVaiGlnHisHisMecAIaProThrLysAspGluPheAspCysLysAlaTrpAlaTyrPtieSerAspVal 
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~o g i ~ AC CTCG AGAAGGACGTGCACAGCGGC C7GA7CGGC-C CC CTGCTGGTGTG C *—-.CAC CAACACC C T GAAC C C C 
~=55> AspLeuGluLysAspValKisSerGlyLeuIi^^ 
Eagt BstEll 

° Q 5 3 ■ " C C C AC GGCCGCCAGGTGAC CG7GC AGG AGTTCGCC CTGTTCTTC AC CATCT7CGACGAG AC CAAGAGCTGG 
~979> AlaHisGlyArgGlnValThrValGlr-GluPheAlaLeuPhePheThrXiiPheAspGluThrLysSerTrp 
3 025 T ACTTC AC CGAGAAC ATGGAGCGCAAC 7GCC GCGCC CC CTGCAACATCCAGATGGAGGACCCCACCTTCAAG 
lQ03*TvrFheThrGluAsriilecCiaArcAsnCysAr^ 

-j q 9 7 q AGAACTACCGCTTCCACGCCATCAAC GGCTACATCATGGACACC CTGCCCGGCCTGGTGATGGCCCAGGAC 
i027^GluAsnTyrArgPheHisAlaIiaAsnGlyTyrZiaMecAspThrLeuPrcGJLyLeuValMetAlaGliiA5p 

Kpnl Pmtl 

3169 CAGCGCATCCGCTGCTTACCTGCTGAGCATGGGCAGCAACGAGAACATC 

lOSl^GlaArglleArgTrpTyrLeuLeuSertletGlySerAsnGluAsallsHisSerlleHisPheSerGlyHis 
3241 GTGTTCACCGTGCGCAAGAAGGAGGAGTACAAGATGGCCCTGTA 

1075* ValPheThrValArgLysLysGluGIuTyrLysMetAlaLeuTyrAsaLeuTyrProGlyValPheGluTlir 
3 3 13 GTGGAGATGCTGCCCAGCAAGGCCGGCATCTGGCGCGTGGAGTGCCTGATC 

1099^ valGluMecLeuProSerLysAlaGXylleTrpArgValGluCysLeuIleGlyGluHisLeuHisAlaGly 
3385 ATGAGCACCCTGTTCCTGGTGTACAGCAACAAGTGCCAGACCCCCCTGGGCATGGCCAGCGGC 
1123 ► MecSerThrLeuPheLeuValTyr£erAsnLysCysGlnTiirProLeuGlyMecAlaSer^31yKisIleArg 

Apal 

3 457 GACrrCCAGATCACCGCCAGCGGCC\GTACGGCCAGTGGGCCCCCAAGCTG^ 

1147 ► AspPheGlnlleThrAlaSerGlyGlnTyrGlyGlnTrpAlaProLysLeuXlaArgLeuHisTyrSerGly 
3529 AGCATCAACGCCTGGAGCACCAAGGAGCCCTTCAGCTGGATCA^ 

U73> scrlleAsnAlaTrpSerThrLysGluProPhcSerTrpIleLysValAspLeuLeuAlaProMetllelle 
3 601 CACGGCATCAAGACCCAGGGCGCCCGCC^GAAGTTCAGCAGCCKn'A 

1195^HisGlyIleLysThrGlnGlyAlaArgGlnLysPheSerS«rLeuTyrIleSerGlnPh©IleIleMetTyr 
3673 AGCCTGGACGGCAAGAAGTGGCAGACCTACCGCGGCAACAGCACCGGCACCCTGATGG 
1219> SerLeuAspGlyLysLysTrpGlnThrTyrArgGlyAstiSerThrGlyThrLeuMctValPhePheGlyAsn 

(Smal/EcoRV) 

3745 QTGGACAGCAGCGGCATC^GCACAACAT 

1243^VaiAspSerSerGlyIleLy3HisAsnIIaPheAs-PrcProIleIleAlaArgTyrIIeArgLeuHisPro 
* 8 17 1 cccACTACAGCATCCGCAGCACCCTC^IGCATGGAGCTGATC 

1^67^ThrHisTy^SerIlsArgSerThrLeuArgaecG:uleuKetGIyCysAspLeuAsn5erCysSerMecPr3 

3889 C7GGGCATGGAGAGCAAGGCCATCAGCGACGCCCAGATCACCGCCAGCAGCTACTTCACCAA 
1291^t.IuGlyMetGluScrLysAlaIleSerAspAlaGlnIleThrAlaSerSerTyrPheThrAsnHetPheAla 

3 961 ACCTGGAGCCCCAGCAAGGCCCGCCTGCACCT^ 

13l5^T w rTrp5erProScrLysAiaArgLeuHisLeuGXr.GlyArgSerAsnAlaTr7ArgProGlnValAsnAsa 

Bstai 

4033 c CCAAGGAGTGGCTGCAGGTGGACTTCC AG AAGAC C ATG AAGGTG AC CGGCG7GACC AC CC AGGGCGTGAAG 
1339> oroLysGluTrpLeuGlnValAspPhfsGlnLysThrMecLysValThrGlyValTarThrGlnGlyValLys 

4 105 A GCXTGCTGACCAGCATG?ACGTGAAGGAGTT^ 
353^ s ^ rLe aLeuThrSerHetTyr7alLysGluPheLeuIleSerSer£erGlrJVspGlyH.isGlnTrpTtirLeu 

* 17? ? I^ CCA GAACGGCAAGGTGAAGGTGTTCCAGGGCAACCAGGACAG^ 

1387> DhePheGliiAsnGlyLysValLysVal?heGlnGlyAsnGlnAspSerPheThr?roValValAsnSerLeu 
4249 GACCCCCCCCTGCTCACCC^CTACCT^CGCATCCACCCCCAGAGCTGC^TC^ACCAGATCGCCCTGCGCATG 
^41]> ?spPrcProLeuLeuThrArrryrL2uArgIleHiS?roGlnSerTri:VaiHisGlnIi2AlaLeuArgMec 

Smal Hindtll 

4321 ;AGGTGCTGGGCTGCGAGC^CCAGGACCTGTACTAGCTGCCCGGGCTACA.\GCTTT 
i 435^ ^^ u valLauGlyCysGluAiaGln-\scLeuTyr • - • 
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EcoRI Ntel ^ 

— "i QAA7TC GTAGGCT AGCATGCAGAT CG AGC TGAvjC AC CTGw - . CTTC . G . w . — - G>- vj\» • - GTGC - - - 
1 ► M e c G 1 z. Z I eG 1 u 1 = u S e rT hr C y 3 P h s P heL e uCy 3 Leu Leu Ax r P he Cy s ? tie 
""3 * ^£GCC - CCCGCCGCTACTACCCGGGCGCCGTGGAGCTGAGCTGGGACTACATGCAGAGCGACCTGGGCGAG 
c^rAiaT^rArgArgTyrTyrLeuGlyAla'/aiGluLeuSerTr^AspTyriletGlriSerAspLeuGl'/Glu 
. ^ ( — ^CCCGTGGACGCCCGCTTCCCCCGCCGCGTGCCGAAGAGCTTCCCCTTCAACACCAGCGTGGTGTACAAG 
~43> L=uPr=ValAspAlaArgPhe?rcPrcArg l /ai?raLysSerPhsPrcPheAsnThrSerVaiVdlTyrLys 
- 1 7 ^gacCCTGTTCGTGGAGTTCACCGACCACCGGTTCAACATCGCC^^ 

"^-jf T vs xhrLeuPheValCIuPheThrAspHisicu?heAsnIIaAlaLysProArgProProTrpSlecGl'/Leu 

*>89 crGGGCCCCACCATCCAGGCCGAGGTGTACGACACCGTGGTGATCACCCTGAAGAACATGGCCAGCCACCCC 
~91> LeuGlyProThrlleGlaAlaGluValTyrAapThrValVallleThrLeuLysAsnMetAlaSerHisPro 
"»61 GTGAGCCTGCACGCCG7GGGCGTGAGCTACTGGAAGGCCAGCGAGGGCGCCGAGTACGACGACCAGACCAGC 
115 ► valSerLeuHisAlaValGlyValSerTyrTr^LysAlaSerGluGlyAlaGluTyrAspAspGlnThrSer 
433 CAGCGCGAGAAGGAGGACGACA^C^TGTTCCCCGGCGGCAGCCACACCTACGTGTGGCAGGTG 
"23* GlaArgGluLysGluAspAspLysValPhePrcGIyGlySerHisThrTyrVaXTrpGInValLauLysGlu 

Mscl Pmll 
= 05 - ^CGGCC C C ATGGC CAGCGACC C C GTG7GC CT G AC CT AC AGC7AC C7G AGCCACG7GGACC7G-G7G AAGGAC 
^63> >snGlyProMecAlaSerAspPrcLeuCysLeuT^rTyrSerTyrLeuS«rHisValAspLeuValLysAsp 

Mscl 

577 C7GAACAGCGGCCTGATCGGCGCCC7GCTGG7G7GCCGCGAGGGCAGCCTGGCCAAGGAGAA 

187> LeuAsnSerGlyLeuIleGXyAlaLeuLeuValCysArgGluGlySerLeuAlaLysGluLysThrGlaThr 

649 cTGCACAAGTrCATCCTGCrGTTCGCCGTGTTCGACGAGGGCAAGAGCT^ 

211 ► teuHisLysPh*^ leLcuLeuPheAlaValPheAs P GluGl V' L y sSerT ^ HisSerGluThrL V , sAsnSer 
721 C7GATGCAGGACCGCGACGCCGCCAGCGCCCGCGCCTGGCCCAAGA7GCAGACC 

235* LeuHetGlaAspArgAspAlaAlaSerAlaArgAlaTrpProLysMecHiaThrValAsnGlyTyrValAsn 

Pmll 

793 CGCAGCCTGCCCGGCCTGATCGGCTGCCACCGCAAGAGCGTGTACTGGCACGTGATCGGCATGGGCACCAGC 
259 ► ArgSerLeuProGlyLeuIleGlyCysHisArgLysSerValTyrTrpHisValliaGlyMecGlyThrTiir 
365 CGGGAGGTGCACAGC^TCrrCCTGGAGGGCCACACCrrCCTGGTGCGCAACCACC 

"°2> praGiuValHisSerllaPheLeuGIuGlyKis-rhrPheLeuValArgAsnKisArgGlRAlaSerLauGlu 
- 3 7 - ~"C-- GCC C CATCACC7TC CTGAC C GCCCAG AC C C7GC7G A7GGACC7GGGCC AGTTC C7GC7G77C7GCCAC 
~Q*jy - • ^ 5er p ro ileThrP. K .-LeuThrAlaGinThrLauLeuMecAspLeuGlyGlnPtieLeuLauPheCysHis 

* 009 A.7CAGCAGCCACCAGCACGACGGCA7GGAGGCCrACGTGAAGGTGGA 

*331^ IlaSerSerHisGlnHisAspGlyMecGluAlaTyrValLysValAspSerCysProGluGluProGlateu 
" 381 CGCATGAAGAACAACGAGGAGGCCGAGGACTACGACGACGACCTGACCGACAGCGAGATGGACGTGGTGC^ 
~"55> A^gMecLysAsnAsnGiuGluAlaGluAspTyrAspAspAspLeuTlirAspSerGluMe t Asp ValVal Arg 

(BgMI/BamHI) 

* 1 53 77CGAC GACGAC AAC AGC C CCAGC7TCA7CC AGATCCGCAGGG7GGCC AAG AAGCAC CC CAAG AC C7GGG7G 
~279> phaAspAsp^spAsnSerProSerPhellaGlr.IieArgSerValAlaLysLysHisProLysTtrTrpVai 

* ^25 C \CTACATCGCCGCCGAGGAGGAGGACTGGGAC7ACGCCCCCCTGGTGCTGGCCCGCGACGACCGCAGCTAC 
"403^HisTvr::5AlaAlaGluGluGIuAspTrrAspTyrAla?rcLeuValLeuAlaPrcAspAspArgSerTyr 

Eagi 

- -) 3 7 ii GAGC CAGTACCTGAAC AACGGCC CCC AGCGC ATCGGCCGCAAGT ACAAGAAGG7GC GC77CA7GGCC7 AC 
~i27>* vsSerGirJTyrieuAsnAsnGlyPreGl^ 

Apal 

" " S Q 1 ^GGACGAGACCrrCAAGACCCGCGAGGCCATCCAGCACGAGAGCGGCATCCTGGGCCGCCrGCTGTACGGC 
~451> ^rl AspG :. J Thr?heLysThrArrGluAla::eGlnHisGluSerGlyIleLeuGlv?rcLeuLeurvrGlv 
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- ^.^^ 3 i3G7GGGGGrwCACCC7GC7GA. * CXTCTTCA--vGru-kCu_-.^vj\.\-.-.\j^ w *:iw^w ^ -.-\Civ.CA7C7^'.CCwCCACGGC _ 
" 47 5 ► G 1 u V a 1 G I y As pT hr L e u L e u 1 1 e 1 1 e ? h e L y 3 As nG 1 r.A 1 a S e r Ar g Pr cTy r As nl 1 eTyr Pr c H i s G I y 

i 513 - ■7CACCGACGTGCGCCCCCTGTACAGCCGCCGCCTGCC ZTAAGGGCG7GAAGCACC7GAAGGAC77CCCCA7C 
499 ► 1 1 ^ThrAsp ValArgPrcLeuTyrSerArgAr gLeuPrt LysGly'/a lLysHisLeuLysAspPhePrc lie 
Bglll 

"585 C TGC CC GGC GAGA7C 77 CAAGT AC AAGTGG AC C G7G AC C GTGGAGGACGGCCCCAC CAAGAGCG AC C CC CGC 
E23* LeuPrcGlyGluIlePheLysTyrLysrr?ThrVaiThrVaiGluAspGlyProThrl*ysSerAspPrcArg 
1 657 7GCCTGACCCGCTACTACAGCAGC77CG7GAACATGGAGCGCGACC7GGCCAGCGK 

"547 ► CysLeuTbrArgTyrTyrSerSsrPheValAsnMetGluArgAspLeuAlaSerGIyLeuIleGlyProLeu 
1 7 29 C7GATCTGCTACAAGGAGAGCG7GGACCAGCGCGGC AACCAGA7CA7GAGCGACAAGCGCAACGTGATCCTG 
571^ LeuileCysTyrLysGluSerValAspGlnArgGlyAsnGlnlleMecSerAspLysArgAsnVallleLftu 

Kpnl 

- g 01 7rCAGCGTGTTCGACGAGAACCGCAGCTGG7ACCTCACCGAGAACATCCAGCGCTTCCTGCCCAACCCCG^ 
"595 ► pheSerValPhcAspGluAsnArgSerTrpTyrLeuThrGluAsnlleGlnArgPlieLeuProAsaProAla 
' 873 GGCGTGCAGCTGGAGGACCCCGAGTTCCAGGCCAGCAACATCAT 

ci9^ GiyValGlnI-euGluAspPrcCluPheGlr*AlaSerAsnIleMecHisSerIIaAsoGlyTyr l /alPiieAap 
• 945 AGCCTGCAGCTGAGCGTGTGCCTGCACGAGGTGGCCTACTGGTACATCCTGAGCATCGGCGCC 

643 ^ SerLeuGlnLeuSerValCysLeuHisGluValAlaTyrTrpTyrlleLeuSerlleGlyAlaGlnThrAsp 
2017 T^CCTGAGCGTGTTCTTCAGCGGCTACACCTTCAAGCACAAGATGGTGTAC - 

667* PheLeuSerValPhePheSerGIyTyrrhrPheLysHisLysMecValTyrGluAapThrLeuThrLeuPtie 

BamHI 

2089 CCCTTCAGCGGCGAGACCGTGTTCVniAGCATGGAGAACCCCGGCC 

691* proPheSerGlyGluThrValPheMetSerMetGluAsnProGlyLeuTrpIleLeuGlyCysHisAsnSer 
2161 GACTTCCGCAACCGCGGCATC-ACCGCCCTGCTGAAGGTGAGCAGCTGCGACAAGA^ 

715 ► AspPheArgAsnArgGlyMecTiirAlaLeuLeuLyaValSerSerCyaAspLyaAsnThrGlyAapTyrTyr 
2233 c-AGGACAGCTACGAGGACATCAGCGCCTACCTGCT^ 

739 ►GluAspSerTyrGluAspI laser AlaTyrLeuLeuSerLysAsnAsnAlalleGluProArgArgArgArg 

BstXI 

^2Q5 CGC GAGATCACCCGCACCACCCTGCAGAGCGACCAGGAGGAGA7CGACTACGACGACACCATCAGCGTGGAG 
~~53> Ar^GluII^ThrArgT^rThrLeuGlnSerAscGlr.Gl -GluIi^AspTyrAspAspThrllaSerValGlu 

- j-j ^ x GAAGAAGG AGG AC T TC GACA7C7 AC GAC G AGGAC G AGAAC CAGAGC CCC C GCAGCTTCCAGAAGAAGACC 
787 ► MecLysLysGluAspPheAspIIaTyrAspGluAspG,uAsnGlr.SerPrcAxgSerPheGlriLysLysThr 

Pmll 

2449 CGCCACTACTTCATCGCCGCCGTGGAGCGCCTGTGGGACTACGGCATGAGCAGCAGCCCCCACGTGCTGCGC 
"811 ► ^ r gHisTyrPh«H*AlaAla7alGluArgLeuTrpAspTyrGiyMecSerSerSerProHisValLeuArg 
2521 AACCGCGCCCAGAGCGGCAGCGTGCCCCAGTTCAAGA^GGTGGTG 

~835> asnArgAlaGlnSerGlySerValProGlnPheLysLysVaXValPheGlnGluPheThrAspGlySerPhe 

Apat 

^593 1.CCCAG<^CCTGTACCGCGGCGAGCTGAACGAGCACCTGGGCCTGCTG 

**359^ T u .rGlnProLeuTyrArgGl%-GIuLeuAsnGluHisLeuGlyLeuLeuGlyPrcTyrIlsArgAlaGluVal 
BstEII 

"■665 GAGGACAACATCATGG7GACC 77 CCGCAACCAGGCCAGCCGCCCCTACAGCTTCTACAGCAGCCTGATCAGC 
~ 383 ► GiuAspAsnlleMecValThrPheArgAsnGlrJllaSerArgPrs-ryrSerPiieTyrSerSerLeuI laSer 
" 737 ACGAGGAGGACCAGCGC CAGGGCGCCGAGCC C CGCAAGAAC77CG7GAAGC CCAACGAGACCAAGAC C7AC 
~ 3Q7> TvrG l U GluAspGln.ArgGlr.GlyAlaGluPr r ArgLys AsnPh-ValLys PrcAsnGluThrLysTtirTyr 
^309 -TC7GGAAGG7GCAGCACC^7GGCCCCCACCAAGGACGAG777GAC7C^ 

%3]> ph-TrcLvsValGIr.HisHisMecAlaPrcThrLvs AscG LuPheAsoCvsLvsAlaTrcAlaTvr PheSer 
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« „ g ^ - G7GG AC C*7GGAGAAGGAC G7GCAC AGC GGC CCG A7CGGC CC Z CTGCTGii'i'iil' £G CACAC CAACACCC G 
r 3oValAs? LeuGluLysAscValHisSerGIyLeuIIeGlyPrcLeuLeuValCv3HisThrAsnThrLeu 

Eagl BstEII 

^ 953 - :^CCCGCCCACGGCCGCCAGG7GACCC7GCAGGAGTTCGCCCTG7^ 

"979> n'snPrcAlaHisGlyArgGlnValThrValGlnGluPheAlaLeuPhePheThrllePheAspGLuThrLya 
3025 ^GCTGGTACTTCACCGAGAACATGGAGCGCAACTGCCGCGCCCCCTGCAACATCC^GATGGAGGACCCCACC 
1003 ► sorTrpTyrPheThrGluAsnMecGluArgAsnCysArgAlaPrcCysAsnlleGlnMeCGluAspPrcThr 
3og7 77CAAGGAGAACTACCGCTTCCACGCCATCAACGGCTACATCATGGACACCCTGCCCGGCCTGGTGATGGCC 
1027 ► ?heLysGiuAsnTyrAxgPheHisAlaIXeAsnGlyTyrIleMctAspThrLeuProGlyLeuValMetAla 

Kpni 

3169 CAGGACCAGCGCATCCGCTGG7ACCTGCTGAGCATGGGCA 

1051^ GlxiAspGluArglleArgTrpTyrLeuLeuSerMetGlySerAsnGluAsnlleHisSerlieHisPheSer 

Pmll 

3241 GGCCACGTGTTCACCGTGCGCAAGAAGGAGGAGTACAAGATGGCCGTGTACAACCTGTACCCCGGCGTGTTC 
l075^GlyHisValPh«ThrValArgLysLysGluGluTyrLysMetAlaLeuTyrAsnteuTyrProGlyValPhe 
3 3 13 c-AGACCGTGGAGATCCTGCCCAGCAAGGCCGGCATCT 

1 099^o:uThrValGluMetLeuPrcSerLysAlaGlyI>.eTrpArgValGluCysLauIiaGlyGluHisLeuHis 
^385 GCCGGCATGAGCACCCTGTTCC7GGTGTACAGCAACAAGTGCCAGACCCCCCT 

1123 ► AiaGlyKecSerThrLeuPheLeuValTyrSerAsnLysCysGlaTSirPrcLeuClyMecAlaSerGlyaia 

Apal 

3457 ATCCGCGACTTCCAGATCACCGCCAGCGGCCAGTACGGCCAGTGGGCCCCCAAGCTGGCCCG 

1147* iiaArgAspPheClnlleThfAlaSerGlyGlaTyrGlyClnTrpAlaProLysLeuAlaArgLeuHisTyr 

3529 AGCGGCAGCATCAACGCCTGGAGCACCAAGGAGCCCTTCAGCTGGATC^^ 

1171> serGlySerIleA3nAlaTrpSerThrLysGluProPheS«rTrpIlcLysValAspL€uLeuAlaProMet 
3601 ATCATCCACGGCATCAAGACCCAGGGCGCCCGCCAGAAGTTCAG 

1195^ ixelleHisGlylleLysThrGlnGlyAlaArgGlnLysPheSerSftrLeuTyrlleSerGlnPhellelle 

3 ^ atCTACAGCC tgc^cggcaagaagtggcagacctaccgcggcaac^ 

U19^MecTyrSerLeuAspGlyLysLysTrpGlnThrTyrArgGlyAsnSerThrGlyThrLcuMetValPhePhe 

(Smal/EcoflV) 

3745 GGCAACGTGGACAGCAGCGGCATCAAGCACAACATCTTCAACCCCCC 

i243^GLyAsnValA5pSerSerGlyIIeLysHisAsriIIePheAsnPrcProIlaIleAlaArgTyrIlaArcLeu 
^817 ~ ^CCCCACCCACTACAGCATCCGCAGCACCCTGCGCATGGAGCTGATGGGCTGCGACCTGAACAGCT 
^67> H^sProThrKisTyrSerlleArgSerThrLeuArgMecGluLeuMeeGlyCysAspLeuAsxiSerCyaSar 
^889 ixGCCCCTGGGCATGGAGAGCAAGGCC\TCAGCGACGCCCAGATCACCGC 

1291>MetProLeuGlyMetGluS«rLysAlaIleSerAspAlaGlnIleThrAlaSerSerTyrPheThrAsiiMet 
3961 -TCGCCACCTCGAGCCCCAGCAAC^C^^ 

1315^ pheAlaThrTrpSerProScrLysAlaArgLeuHisLeuGlnGlyArgSarAsnAlaTrpArgProGlaVal 

BstEII 

4033 i^cAACC CCAAGGAGTGGC7GCAGG7GGAC7TC CAGAAGAC C A7GAAGG7GACCGGCG7GAC CACCCAGGGC 
1339^ AsnAsnProLysGluTrpLeuGlciValAspPheGlnLysThrMecLysValThrGlyValThrThrGlnGly 
4 105 G TGAAGAck:CTGCTGACCAGCA7G7ACCrrGAAGGAGTTC^ 

1363* vaiLysSerLeuLeuThrSer«.«cTyrValLysGluPheLeuIleSerSerSerGlnAspGlyKisGlr*Trp 
4177 - c CCTGTTCTTCCAGAACGGCAAGG7GAAGGTGTTCCAGGGCAAC C AGGAC AGC7TCACCCCCG7GG7G AAC 
1387^TV.rLeu?hePheGlnAsnGlyV/sValLy3ValPheGlnGlyAsnGln 
4249 i^c7C<;ACCCCCCCCTGCTa-.CCCGC7ACC 

V*^aiLt5pP* 3ProLeuLeu " rte ^ 
X Smal Hindtll 

4321 CGCA7GGAGG7GCTGGGC7GCGAGGCCCAGGACC7 G7ACTAGCTGCCCGGGC7ACAAGC7Tr AC 

M35> a^gitetGluValLeuGiyCysGluAlaGir-AspLeuTyr • • * 
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GGATCCATGCAGCGCGTGAACATGATCATGGCCGAGAGCCCCGGCCTGATCACCATCTG 

CCTGCTGGGCTACCTGCTGAGCGCCGAGTGCACCGTGTTCCTGGACCACGAGAACGCCA 

ACA.^GATCCTGAACCGCCCCAAGCGCTACAACAGCGGCAAGCTGGAGGAGTTCGTGCAG 

GGCAACCTGGAGCGCGAGTGCATGGAGGAGAAGTGCAGCTTCGAGGAGGCCCGCGAGGT 

GTT'-GAGAACACCGAGCGCACCACCGAGTTCTGGAAGCAGTACGTGGACGGCGACCAGT 

GCG^GAGCAACCCCTGCCTGAACGGCGGCAGCTGCAAGGACGACATCAACAGCTACGAG 

TGCTGGTGCCCCTTCGGCTTCGAGGGCAAGAACTGCGAGCTGGACGTGACCTGCAACAT 

CAAGAACGGCCGCTGCGAGCAGTTCTGCAAGAACAGCGCCGACAACAAGGTGGTGTGCA 

GCTGCACCGAGGGCTACCGCCTGGCCGAGAACCAGAAGAGCTGCGAGCCCGCCGTGCCC 

TTCCCCTGCGGCCGCGTGAGCGTGAGCCAGACCAGCAAGCTGACCCGCGCCGAGACCGT 

GTTCCCCGACGTGGACTACGTGAACAGCACCGAGGCCGAGACCATCCTGGACAACATCA 

CCCAGAGCACCCAGAGCTTCAACGACTTCACCCGCGTGGTGGGCGGCGAGGACGCCAAG 

CCCGGCCAGTTCCCCTGGCAGGTGGTGCTGAACGGCAAGGTGGACGCCTTCTGCGGCGG 

CAGCATCGTGAACGAGAAGTGGATCGTGACCGCCGCCCACTGCGTGGAGACCGGCGTGA 

AGATCACCGTGGTGGCCGGCGAGCACAACATCGAGGAGACCGAGCACACCGAGCAGAAG 

CGCAACGTGATCCGCATCATCCCCCACCACAACTACAACGCCGCCATCAACAACTACAA 

CCACGACATCGCCCTGCTGGAGCTGGACGAGCCCCTGGTGCTGAACAGCTACGTeAcuC 

CCATCTGCATCGCCGACAAGGAGTACACC^CATCTTCCTGAAGTTCGGCAGCGGCTAC 

GTGAGCGGCTGGGGCCGCGTGTTCCACAAGGGCCGCAGCGCCCTGGTGCTGCAGTACCT 

GCGCGTGCCCCTGGTGGACCGCGCCACCTGCCTGCGCAGCACCAAGTTCACCATCTACA 

ACAACATGTTCTGCGCCGGCTTCCACGAGGGCGGCCGCGACAGCTGCCAGGGCGACAGC 

GGCGGCCCCCACGTGACCGAGGTGGAGGGCACCAGCTTCCTGACCGGCATCATCAGCTG 

GGGCGAGGAGTGCGCCATGAAGGGCAAGTACGGCATCTACACCAAGGTGAGCCGCTACG 

TGAACTGGATCAAGGAGAAGACCAAGCTGACCTAATGAAAGATGGATTTCCAAGGTTAA 

TTCATTGGAATTGAAAATTAACAGGGCCTCTCACTAACTAATCACTTTCCCATCTTTTG 

TTAGATTTGAATATATACATTCTAGGATCC 
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GGATCCGCTA6AGCGGAAATTTATGCTGTCCGGTCACCGTGACAATGCAGCT6CGCAAC 

CCCGAGCTGCACCTGGGCTGCGCCCTGGCCCTGCGCTTCCTGGCCCTGGTGAGCTGGGA 

CATCCCCGGCGCCCGCGCCCTGGACAACGGCCTGGCCCGCACCCCCACCATGGGCTGGC 

TGCACTGGGAGCGCTTCATGTGCAACCTGGACTGCCAGGAGGAGCCCGACAGCTGCATC 

AGCGAGAAGCTGTTCATGGAGATGGCCGAGCTGATGGTGAGCGAGGGCTGGAAGGACGC 

CGGCTACGAGTACCTGTGCATCGACGACTGCTGGATGGCCCCCCAGCGCGACAGCGAGG 

GCCGCCTGCAGGCCGACCCCCAGCGCTTCCCCCACGGCATCCGCCAGCTGGCCAACTAC 

GTGCACAGCAAGGGCCTGAAGCTGGGCATCTACGCCGACGTGGGCAACAAGACCTGCGC 

CGGCTTCCCCGGCAGCTTCGGCTACTACGACATCGACGCCCAGACCTTCGCCGACTGGG 

GCGTGGACCTGCTGAAGTTCGACGGCTGCTACTGCGACAGCCTGGAGAACCTGGCCGAC 

GGCTACAAGCACATGAGCCTGGCCCTGAACCGCACCGGCCGCAGCATCGTGTACAGCTG 

CGAGTGGCCCCTGTACATGTGGCCCTTCCAGAAGCCCAACTACACCGAGATCCGCCAGT 

ACTGCAACCACTGGCGC^CTTCGCCGAC^TCGACGACAGCTGGAAGAGCATCAAGAGC 

ATCCTGGACTGGACCAGCTTC7VACCAGGAGCGCATCGTGGACGTGGCCGGCCCCGGCGG 

CTGGAACGACCCCGACATGCTGGTGATCGGCAACTTCGGCCTGAGCTGGAACCAGCAGG 

TGACCCAGATGGCCCTGTGGGCCATCATGGCCGCCCCCCTGTTCATGAGCAACGACCTG 

CGCCACATCAGCCCCCAGGCCAAGGCCCTGCTGCAGGACAAGGACGTGATCGCCATCAA 

CCAGKaACCCCCTGGGCAAGCAGGGCTACCAGCTGCGCCAGGGCGACAACTTCGAGGTGT 

GGGAGCGCCCCCTGAGCGGCCTGGCCTGGGCCGTGGCCATGATCAACCGCCAGGAGAtC 

GGCGGCCCCCGCAGCTACACCATCGCCGTGGCCAGCCTGGGCAAGGGCGTGGCCTGCAA 

CCCCGCCTGCTTCATCACCCAGCTGCTGCCCGTGAAGCGCAAGCTGGGCTTCTACGAGT 

GGACCAGCCGCCTGCGCAGCCACATCAACCCCACCGGCACCGTGCTGCTGCAGCTGGAG 

AACACCATGCAGATGAGCCTGAAGGACCTGCTGTAAAAAAAAAAAAAACTCGAG 
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